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SECTION  I 


INTRODUCTION 


This  is  the  final  report  for  research  in  the  area  of  Computation  of  Hypersonic  Interference 
Flowfields  for  the  U.S.  Air  Force.  Basic  guidelines  for  the  fiow  regime  of  concern  was  a  fiee- 
stream  Mach  number  of  10  and  less  at  an  altitude  of  100,000  feet  and  below.  A  significant  por¬ 
tion  of  this  research  was  associated  with  solving  hypersonic  flow  problems  in  chemical  equilibri¬ 
um.  This  hypersonic  chemical  equilibrium  flow  effort  was  sufficiently  large  and  self-contained 
that  a  separate  report  was  devoted  to  this  task.  This  report  is  entitled  “An  Efficient  Solver  for 
Flows  in  Local  Chemical  Equilibrium”  and  it  will  be  published  as  an  AFATL  report. 

This  report  contains  the  equations  solved,  the  numerical  cell  face  flux  formulation  used,  the 
numerical  methods  developed  for  solving  the  system  of  discretized  equations,  a  discussion  of 
software  optimization,  and  example  results.  The  equations  solved  are  the  compressible  form  of 
the  three-dimensional  unsteady  Euler  and  Navi^-Stokes  equations,  and  these  equations  are  dis¬ 
cussed  in  Section  n.  The  numerical  flux  formulation  used  here  is  neither  a  Monotone  Up¬ 
stream-centered  Schemes  for  Conservation  Laws  (MUSCL)  approach,  which  is  a  dependent 
variable  extrapolation  method,  nor  a  flux  extrapolation  method;  rather  it  falls  somewhere  in  be¬ 
tween.  The  development  of  the  present  numerical  flux  formulation  and  how  it  differs  from  the 
dependent  variable  extrtqralation  and  flux  extrapolation  methods  is  presented  in  Section  in. 

During  the  course  of  this  research  various  numerical  solution  schemes  were  developed  and 
applied  to  the  solution  of  the  equations  for  both  sready  and  unsteady  flow.  This  was  a  rather  im¬ 
portant  aspect  of  the  research  because  it  was  discovered  that  the  previously  used  numerical  solu¬ 
tion  scheme,  the  so-called  two-pass  scheme,  had  difficulties  in  solving  the  equations  on  grids 
that  were  clustered  on  both  ends  of  a  computational  coordinate  direction.  For  simple  external 
flow  problems  where  the  grid  is  typically  clustoed  near  the  the  body  and  stretched  as  the  dis¬ 
tance  from  the  body  increases,  the  old  two-pass  scheme  seemed  to  worit  rather  well.  But  for  a 
situation  where  the  grid  was  clustered  on  both  ends  of  a  computational  coordinate  such  as  might 
occur  along  a  grid  line  that  runs  from  a  store  to  the  fuselage,  for  example,  numerical  difficulties 
were  encountered.  This  was  particulariy  noticeable  for  Navier-Stokes  grids  where  the  clustering 
could  be  severe.  A  new  scheme,  referred  to  as  the  modified  two^ass  scheme,  has  provided  a 
method  for  solving  such  problems  for  all  cases  considered  thus  far.  In  addition,  an  extension  of 
this  modified  two-pass  scheme  was  developed  and  it  is  referred  to  as  the  N-pass  scheme.  A  re¬ 
view  of  the  old  two-^ass  scheme  and  the  development  of  these  new  numerical  solution  schemes 
are  presented  in  Section  IV. 
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The  numerical  solution  schemes  presented  in  Section  IV  have  developed  into  rather  work¬ 
horse  methods  for  solving  the  three-dimensional  Euler  and  Navier-Stokes  equations  for  reason¬ 
ably  complex  configurations.  Because  of  the  collective  amount  of  computer  resources  being  de¬ 
voted  K)  solving  such  problems,  an  effort  was  directed  toward  investigating  ways  of  improving 
the  computational  efficiency  of  these  numerical  solution  schemes.  Results  of  this  effort  are  pres¬ 
ented  in  Section  V. 

Numerous  computations  have  been  performed  using  the  computational  techniques  discussed 
in  this  report  Selected  examples  of  the  various  results  are  given  in  Section  VI.  Concluding  re  - 
marks  are  given  in  Section  Vn. 
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SECTION  n 


C30VERNING  EQUATIONS 
1.  NAVIER-STOKES  EQUATIONS 


The  equations  of  motion  which  describe  the  behavior  of  a  continuum  fluid  flow  are  the  Nav- 
iCT-Stokes  equations.  In  the  absence  of  body  forces  and  heat  sources,  the  conservation  laws  for 
mass,  momentum,  and  energy  of  a  perfect  gas  over  a  stationary  finite  volume  Q,  enclosed  by  a 
control  surface  are  expressed  by  the  following  integral  form  of  the  non-dimensional  Navier- 
Stokes  equations 


where 


F(q)  *  nds  + 


(q,Vq)  •  nds  =  0 


q  =  [  e.Qu.ev,ew,e]'^ 
F  =  [  f  .g  ,h  f 
Fy  =  [  ?»  tMv 


The  convected  flux  vectors  are 


f  =  [  Qu  .  +  P  .  0«v  ,  euw  ,  u(e  +  p)  ] 

I  =  [  ev  ,  Quv  ,  Qv^  +  p  ,  Qvw  ,  v(e  +  p)  ] 

H  =  [  QW  ,  QUW  ,  QVW  ,  QW^  +  p  ,  w(e  +  p)  ]  ^ 

e  =  ej  +  ^Q  (u^  +  v^  +  w^) 

The  vectors  of  diffusive  terms  are 


[  0  ,  Txx  «  txy  »  f  "i"  '^xyV  +  XxzW  qx  ]  ^ 

^  [Of  1<yx  »  ^yy  »  ^yz  *  “^yx®  ^yy^  "tyzW  ~  qy  ] 

Hy  “  [  0  ,  Xjx  »  Xjy  ,  Tn  ,  TzxU  +  TxyV  +  Xj^W  ~  qz  ] 

L 

The  viscous  stress  and  heat  flux  terms  ate 


(1) 
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“txx 

II 

+  X)  Ux 

+  X(Vy 

-I- 

Wz) 

Xyy 

=  (2|i 

+  X)  Vy 

,  +  X(Ux 

-1- 

Wz) 

Tji 

=  (2|i 

+  X)  w 

Z  +  X(Ux 

-1- 

Vy) 

“Cxy 

=  Xyx 

=  H  (Uy 

+  Vx) 

‘'^xx 

«  Xxx 

=  |A  (Uz 

+  Wx) 

Xyx 

= 

=  (Vz 

+  Wy) 

qx  = 

qy  = 
qx  = 


_  ifil 

_  K-il 

By 

iril 


and 


K  = 


(Y  -  l)Pr 


where  K  is  the  thermal  conductivity,  and  T  is  the  temperature. 

Here  conventional  definitions  of  the  flow  quantities  are  used.  The  Cartesian  velocities  com¬ 
ponents  (u,  v,w)  are  normalized  by  die  fine-stream  speed  of  sound  a  « ,  e  is  normalized  by  q  «d  , 
the  internal  energy  ei ,  the  total  energy  e  and  tire  pressure  p  are  normalized  by  q«o  a«.  The  two 
viscosity  coefificients  X  and  p  are  normalized  by  the  molecular  viscosity  (loo.  The  constant  y  is 
the  ratio  of  specific  heats,  Re^  is  the  Reynolds  number  based  on  free-stream  veloci^  and  refer¬ 
ence  length  L,  and  Pr  is  the  Prandtl  number.  For  a  perfect  gas  the  normalized  state  relations  are 

P  “  Y  ®  T  ,  ej  =  y(y  -  1)  ’ 

where  the  temperature  T  is  normalized  with  respect  to  T  «e.  The  system  of  equations  (Equation 
1)  is  valid  for  turbulent  as  well  as  laminar  flows  by  rq)lacing  the  molecular  transport  coefficients 
with  tfadr  turbulent  counterparts.  The  governing  equations  become  the  so-called  Reynoldr-aver- 
aged  Navier-Stokes  equations. 

Ftir  higM^eynolds-numbers  flows,  fee  viscous  effects  are  confiired  to  a  thin  layer  near  the 
wall  boundary  and  dominated  by  fee  viscous  terms  assodaied  wife  the  strain  rates  nonnal  to  the 
wall  The  viscous  terms  associated  wife  fee  strain  rates  along  the  body  surfEure  are  ctnnparative- 
ly  small  and  negligible.  This  concept  was  first  discussed  by  Prandtl  in  fee  development  of 
boundary-layer  theory  and  has  been  applied  and  extended  to  various  problems.  The  develt^ 
ment  of  the  single  thin-layer  approximation  of  the  full  Navier-Stokes  equations  was  introduced 
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by  Steger  [Reference  1]  and  used  by  Gatlin  [Reference  2]  and  Simpson  [Reference  3],  in  which 
only  viscous  terms  in  the  body-Hiormal,  t},  direction  are  retained.  Here  the  concept  is  extended 
to  the  case  of  thin-layer  approximation  to  two  directions  for  a  general  coordinate  system.  All 
the  viscous  terms  associated  with  cross-derivatives  where  dx-i  where  i?^j)  are  neglected, 

but  with  retention  of  the  viscous  terms  with  normal  second  derivatives  (d^/dx‘  dx^  where  i  =  j). 
This  approximation  retains  the  most  dominant  terms  in  the  governing  equation.  The  thin-layer 
Navier-Stokes  equations  are  obtained  by  retairung  from  the  full  Navier-Stokes  equations  all  the 
viscous  terms  except  for  those  containing  derivatives  in  the  stieamwise,  direction,  as  well  as 
excluding  all  cross-derivatives. 

The  time-dependent  thin-layer  Navier-Stokes  equations  in  general  body-fin;^  coordinates, 
written  in  nondimensional  strong  conversation  law  form  are 

30  ,  aF  ,  3(G-Gv)  .  a(H-Hv)  ^ 

IT  +  ^  + - 5!! —  ^ - “  (2) 

where  the  curvilinear  coordinates  are  defined  as 

I  =  |(x,y,z,t)  ,  ^  =  ^(x,y,z,t) 

r\  =  Ti(x,y,z,t)  ,  T  =  t 

and  the  vector  of  the  dependent  variable  Q  and  the  Euler  flux  vector  F,  G,  and  H  are  given  by 


eu 

6 

QU 

QUU  +  Ixp 

QV 

,  F  =  J 

QVU  +  lyP 

QW 

ewu  +  izp 

e 

U(e  +  p)  -  It  P 

[qV 

1 

qW 

QUV  +  tlxp 

QuW  +  Ixp 

QVV  +  TlyP 

>  H  =  J 

QVW  +  lyP 

gwV  +  Tizp 

QWW  +  Izp 

V(e  +  p)  -  Tit  p 

W(e  +  p)  -  It  p 

with  the  contravariant  velocities,  U,  V,  and  W,  defined  as 

U  =  It  +  +  %yV  +  IzW 

V  =  Tit  +  TlxU  +  IlyV  +  ’IzW 

W  =  It  +  CxU  +  CyV  +  IzW 

and  the  quantity  J  is  the  Jacobian  of  the  invert  transformation  and  is  given  by 
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J  —  X|(yT|Z|-  “  Zrjy^.) 

The  metric  quantities  are  given  by 

lx  =  J"^(yt,zt  -  Zny^)  ’I*  =  J'^Cz^y^  -  yi^^) 

ly  =  J'kztiX;  -  Xy^z^)  qy  =  J"kx|Z^  -  2|X^) 

I2  =  -  y^x^)  i\z  = 

lx  =  J"^(y|Zii  -  Z|yti)  It  =  (-  Xtlx  -  ytly  -  ztiz) 

ly  =  J"kxT,Z|  -  Z|,X|)  »lt  =  (-  Xttlx  -  ytTly  -  Zt^lz) 

lx  =  J"Hx|yi,  -  y|XT,)  It  =  (  -  X,|x  -  ytly  -  Zilz) 

The  pressure,  density,  and  velocity  components  are  related  to  the  enm^  for  an  ideal  gas  by  the 
following  equation 

P  =  (Y  -  l)[e  -  5Q(«^  +  v^  +  w2)j 

where  the  ratio  of  specific  heats,  y,  is  taken  as  1.4 

The  general  form  of  the  thin-layer  viscous  flux  vector  can  be  written  as 

ro 

Tkx 

Sv  =  J  (3) 

Tfa 

uTfcj  +  vTfcy  +  wTkj  - 

where 

=  kxXxx  +  kyXxy  +  kxXxz 
T|jy  =  kxT^qr  +  ^y'^^yy  + 

Tjjj  "  kx^xz  +  l^y'^yz  + 

Tk  »  kxqx  +  kyqy  +  kzqz 

The  viscous  flux  vectors  Ov  and  Hv  are  given  by  die  Sv  vector  ttepending  cm  whether  k  in  Equa¬ 
tion  (3)  is  T),  or  g,  respectively. 

The  elements  of  the  viscous  stress  tensor  are 
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txx  =  [2kxUk  -  kyWy.  -  kjwJ 

-  J^x“k  -  Vk] 

•'xy  =  tyx  =  [kyUk  +  kxvj 

t«  =  [kjUk  +  kxwj 

=  ‘'^y  =  ^  +  ^^k] 

and  the  heat  flux  are 

-  )tMa>  ,  nn 

^  ■  (Y  -  DPrReL 
^  “  (Y  -  l)PrReL 


-  |iM«e  . 

^  *  (Y  -  DPTReL  ^  ^ 

The  element  of  the  viscous  stress  tensor.  Equation  (4),  and  the  heat  flux  terms.  Equation  (5), 
beltmg  to  die  viscous  flux  vectors  Gv^  or  Hy^  when  k  in  these  equations  is  q,  or  g,  respectively. 

A  finite  volume  method  is  adopted  to  ensure  die  final  converged  solution  is  independent  of 
the  integration  procedure  and  to  avmd  metric  singularity  problems.  An  implicit  discretized  inte¬ 
gral  form  of  the  thin-layer  Navier-Stokes  equations.  Equation  (2),  in  computatitm  space  for  a 
cell  with  center  denoted  as  (i,  j,  k)  is 

AQ“  +  At[6iP‘+^  +  6j  (G  -  Gt)“+i  +  6k  (H  -  =  0  (6) 

where 


AQ“  =  0“"^^  -  0“ 

«•(•)  -  -  (•)*_! 

and  i,  j,  and  k  are  the  indices  in  die  |,  q,  and  %  direction,  respectively,  and  n  is  the  temporal  in¬ 
dex.  In  diis  equation,  the  depmident  variable  Q  is  considered  to  be  omstant  throughout  cell  (i,  j. 
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k),  while  the  inviscid  and  viscous  fluxes  are  assumed  to  be  uniform  over  each  of  the  six  surfaces 
of  the  cell.  An  unfactored  implicit  scheme  can  be  obtained  from  the  Equation  (6)  by  linearizing 
the  inviscid  and  viscous  flux  vectors  about  the  previous  time  level  and  dropping  terms  of  the  se¬ 
cond  or  higher  order,  resulting  in  the  following  linear  equation 

[l  -I-  At  (6i  A  •  +  6j  B  •  +6^0*  -  6j  By  •  -  6^  Cy  •  )]  AQ“  =  -  AtR" 

where  (7) 

R"  =  6i  F"  +  6j  G"  +  6k  H"  -  6j  G"  -  6^  H" 

The  inviscid  flux  Jacobians  A,  B,  and  C  and  viscous  flux  Jacobians  By  and  Cy  terms  arising  from 
linearization  are  given  by 


^Gy  _  _  3Hv 

dQ  ’  ^  dQ 


2.  BOUNDARY  CONDITIONS 


It  is  well  known  that,  when  dealing  with  a  time-marching  formulation,  the  reflecting  behav¬ 
ior  of  the  numerical  boundary  conditions  must  be  minimized  in  order  to  enhance  the  conver¬ 
gence  rate  to  the  steady  state.  In  the  present  work  all  boundary  conditions  are  explicitly  imple¬ 
mented.  They  include  farfreld,  solid  wall,  and  block-to-block  boundary  conditions.  All  farfield 
(i.e.  inflow-outflow)  and  inviscid  impermeable  boundaries  used  characteristic  variable  boundary 
condititms  as  derived  in  Reference  [4]  for  stationary  grids  and  in  Reference  [5]  for  dynanuc 
grids.  As  in  these  references,  the  boundary  conditions  are  implemented  utilizing  one  layer  of 
points  outside  of  the  computational  field,  called  phantom  points,  which  results  in  first-order  ac¬ 
curacy  at  the  boundaries.  The  change  in  dependent  variable,  A(^,  is  set  to  zero  for  all  bound¬ 
aries  excq)t  at  block-to-block  boundaries.  The  approach  adopted  here  with  regard  to  block-to- 
block  intBrfrK:e  communication  is  a  direct  extraction-injection  procedure  which  was  taken  by 
Belk  [Reference  S] .  This  means  the  informatitm  (such  as  Q,  AQ)  from  within  the  domain  of 
(me  bl<mk  can  be  extracted  and  then  injected  as  phantom  data  in  an  adjacent  bltxtit.  In  Reference 
[5],  Belk  investigated  many  of  the  dilemmas  posed  when  attempting  time-accurate  flowfield 
simulations  using  dynamic  bltmked  grids.  Belk  showed  that  using  synchronized  dependent  vari¬ 
ables  and  approximating  the  value  of  the  solution  vector  required  at  block  boundaries  with  what¬ 
ever  infoimaticm  is  currently  available  fixmi  adjoining  blocks  introduces  an  0((At) VAx)  error  at 
the  boundary  and  gives  unsteady  results  that  compare  well  with  unbltmked  results  even  for  cases 
with  a  shock  wave  passing  through  the  blcxdc  boundary.  For  viscous  wall  boundaries,  a  no-slip 
implementaticm  of  the  zero  normal  pressure  gradient  boundary  condititm  is  applied  at  the  b(xly 
surface.  In  the  case  of  wall  heat  transfer,  the  wall  tempmuture  Tw  is  specified,  and  the  density  at 
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the  wall  is  computed  from  the  equation  of  state,  knowing  the  surface  pressure  and  surface  tem¬ 
perature,  to  wit 

Pp  “  Pf  “  Pw 
Vp  =  2Xw  -  Vf 
Ow  *  (YPw)/Tw 
Op  =  2ew  -  Of 

Cp  =  (Y  -  1)~*  Pp  +  2  +  vj  +  wj> 

where  Xw  is  wall  grid  speed  and  the  subscripts  p,  f,  and  w  denote  phantom  points,  field  points, 
and  wall  boundary  points  respectively. 

3.  TURBULENCE  MODEL 

Closure  of  the  governing  equations  is  achieved  by  using  a  turbulence  motkl  to  obtain  the 
eddy  viscosity  values.  In  this  study,  the  viscosi^  coefficient  in  Equatitm  (2)  for  turbulent  flow  is 
modeled  as  tiie  sum  of  the  laminar  and  turbulent  viscosity  in  the  eddy  viscosity  q>proach.  The 
turbulent  eddy  viscosity  pt  is  computed  by  using  the  algebraic  viscosity  model  due  to  Baldwin 
and  Lmnax  [Reference  6].  It  is  a  two-layer  model  in  which  an  eddy  viscosity  is  calculated  for 
an  inner  and  outer  region.  The  inner  region  foUows  the  Frandti-van  Driest  formulation.  In  both 
the  inner  and  outer  formulations  the  distribution  of  vortidty  is  used  to  determine  the  length 
scales,  tiiereby  avoiding  the  necessi^  of  finding  the  outer  edge  of  boundary  layer.  For  the  inner 
region, 

=  q€^  kol 

where 

<  -  kyj^l  -  «p(^)] 

and 

y  is  the  nonnal  distance  from  die  wall,  kol  is  the  absolute  magnitude  of  vorticity,  and  k«0.4  is  die 
von  Kaiman  constant 

The  eddy  viscosity  for  die  outer  regitm  is  giwn  by 

PW  "  ^^^^Wake^KUbCy) 
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where 


=  minimum(  ymax  ^nax  ,  ymax 

\  ^max 

The  quantities  Vm«  and  F»n«  are  determined  from  the  function 

FOr)  -  y|<a|  -  o?'(^)] 

where  Faux  is  the  maximum  value  of  F(y)  and  Vm«r  is  the  value  of  y  at  which  it  occurs.  The 
function  Fudy(y)  is  the  Klebanoff  intermittency  factor  determined  from 

F»«(y)  -  1  + 

The  quantity  U<fifr  is  the  difference  between  the  maximum  and  minimum  total  velocity  in  the 
profile  and,  for  boundary  layers,  the  minimum  is  defined  as  rsero. 

It  is  necessary  to  specify  the  following  constants:  C^l.6,  Cueb=0.3,  Cwk=0.25,  and 
K=0.0168  is  the  Causer  omstant 


SECTION  m 

NUMERICAL  FLUX  FORMULATION 


This  section  is  concerned  with  the  numerical  flux  fomulation  of  the  convective  ttfms.  In 
this  wOTk  the  equations  are  discretized  into  a  cell-center  finite  volume  formulation  that  dq)ends 
on  the  numerical  flux  at  each  cell  face  of  the  finite  volume.  There  are,  of  course,  many  methods 
available  now  to  determine  this  cell-face  numerical  flux.  These  methods  range  from  central  dif¬ 
ferencing  to  high  resolution  upwind  schemes  [Reference  7].  Simpson  [Refocnce  8]  dmnon- 
strates  that  exceptional  results  for  a  laminar  viscous  flow  can  be  obtained  using  a  high  resolution 
method  based  on  Roe’s  r^roximate  Riemann  solver  [Reference  9]  with  only  a  few  points  in  the 
viscous  region  compared  to  a  flux  vector  split  scheme  [References  2  and  10]  with  many  m(»e 
points. 

During  the  course  of  this  research  it  became  apparent  that  the  approximate  Riemann  solver 
of  Roe  has  much  to  offer  with  regard  to  the  quality  of  the  numerical  solution  of  the  Euler  and 
Navier-Stokes  equations.  There  are  various  methods  in  iise  fcv  extending  the  first-order  Roe 
scheme  for  the  convective  fluxes  to  higher  cxder.  Two  of  tiiese  methods  wiU  be  discussed  and  a 
third  method  will  be  presented  which  was  develqxxi  and  used  for  all  results  presented  in  this  re¬ 
port 

Pot  multidimensional  flow  the  assumption  is  made  that  the  waves  move  ncxmal  to  the  cell 
face.  Therefore,  it  is  sufficient  to  present  tibe  numerical  flux  vector  in  only  one  dimension,  as  the 
same  fmmulation  is  used  in  the  other  coordinate  directions. 

The  flrst-txder  numerical  flux,  f*,  at  cell  face  i  +  1/2  resulting  from  Roe’s  approximate  Rie¬ 
mann  solver  can  be  expressed  by  any  one  of  the  following  three  relations 


or 


(  f  (Q,)].  ,  +  ya  .  1  r® 


(8) 


‘+2  J.>+i  i+i  i+i 


(9) 


j-1 


1+il 


i+5 


(10) 


where 

% 
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,0 

i+il 


=  -  X-® 

i+i 


•+5 


a  =  -  (Q.^,  - 

j.i+i  i+i  ''^1+1 


(11) 

(12) 


and  n  =  5  for  three-dimensional  fluid  flow.  In  Eqs.(8-10),  ®  c<»Tespond  to  the  positive  and 

negative  eigenvalues  of  the  Roe  matrix,  r(il  are  the  right  eigenvect(xs  of  the  Roe  matrix,  art 

left  eigenvectms  of  the  Roe  matrix,  and  the  scalars  aj  are  jumps  in  the  characteristic  variables. 
The  subscript  i+1/2  in  the  equations  above  indicates  that  the  metrics  used  correspond  to  the  cell 
face  located  at  i+ip..  The  dependent  variables  in  the  eigenvalues  and  eigenvectors  are,  ot 
course,  the  Roe  averaged  variables  [Refo-ence  9]  at  cell  face  i+1/2  since  they  comprise  the  ei- 
gensystem  of  the  Roe  matrix.  The  flux  vectors  f  (Q^  and  flC^k-l)  in  Equations  (8-10)  are  eva¬ 
luated  using  the  dependent  variable  vectrx*  as  indicated  (not  the  Roe  averaged  variables),  but  the 
subscript  i+1/2  on  the  bracket  indicates  the  metrics  at  i+1/2  are  to  be  used. 

Equations  (8-10)  can  also  be  written,  respectively,  as 


(Q,„-Qp 

-  I  f  \.i-  S'(Q, .  Q,..)  (Qi,.  -  QP 
f “  jl  f  (QP  +  f  (Qim)  -  5®  «3i  •  -  op 

where  /I  is  the  usual  diagonalized  Roe  matrix  composed  of  Roe  averaged  variables  with 

_ ±  +  _i 

A  =  TA^T  '■ 
lAI  =  a"^  -  a  " 

The  matrix  T  in  Equation  (16)  has  as  its  columns  the  right  eigenvectors,  r^^,  of  the  Roe  matrix. 

The  matrices  A  ^are  diagonal  matrices  with  eigenvalues,  X  ^  of  the  Roe  matrix  along  the  di¬ 
agonal. 


(13) 

(14) 

(15) 

(16) 
(17) 


One  of  the  more  straightfmward  and  frequently  used  methods  for  extending  the  Roe  scheme 
to  higher  wder  is  tiie  so-called  MUSCL  reproach  introduced  by  Van  Leer  [Reference  11].  This 
approach  is  a  dq>endent  variable  extrapolation  method  whereby  a  dependrat  variable  on  either 
side  of  a  cell  face,  denoted  and  Ql  .  ia  extrapolated  up  to  either  side  of  the  cell  face  and  these 

extri4)olated  variables  are  then  used  in  the  Roe  f(xmulation,  e.g.  Equation  (15),  to  compute  the 
numerical  flux  at  a  cell  face.  Ihe  higher-<xxi«'  Roe  flux  could  then  be  computed  from  the  rela¬ 
tion 


12 


(18) 


fi+l/2  =  ^[f(QL)  +  f(QRMi-H/2  -  ^IA(Ql.Qr)I  (Qr  -  Ql) 
where  Q^i  has  been  replaced  by  Qr,  and  has  been  replaced  by  Ql- 

It  should  also  be  noted  that  limiting  can  be  used  in  this  formulation  by  including  limiters  in 
the  computation  of  the  extrapolated  dependent  variables  and  as  was  done  by  Anderson, 
Thomas,  and  Van  Leer  [Reference  12].  Extrapolation  formulas  without  limiting  are  also  pres¬ 
ented  in  Reference  12  as  well  as  Reference  7. 

Anotho*  approach  is  the  numerical  flux  family  of  higher-order  accurate  TVD  schemes  using 
Roe  averaging  [Reference  9]  introduced  by  Osher  and  Chakravarthy  [Reference  13].  The  nu¬ 
merical  flux  for  this  family  of  TVD  schemes  for  up  to  third-order  can  be  written  as  [Reference 
14] 

Ci/2  =  |[f  (Q,)  +  f  -  05.1/2) 


j=i  *■ 


i-1/2  ^  4  "^1+1/2 


i+3/2  4  ■’  1+1/2 


where 


0^.,/2  =  K.f/2  (20) 

and  again  n  =  5  for  three-dimensional  fluid  flow.  The  parameter  r{;  is  typically  -1,0,  m*  1/3  and 
controls  the  magnitude  of  the  truncation  error  without  limiting  [Reference  IS]. 

The  functions  Lj^  (€,  m)  used  in  Equation  (19)  are  limiters.  Three  limiters  were  used;  the 
minmod.  Superbee,  and  Van  Leer.  The  minmod  limiter  is: 


Lf  «,m)  =  imiimod(o*^</j, 


where 


minmod  (x,y)  =  sign(x)  max{0,  min[lxl,  y  sign(x)]}  (2 

b  0 

1  —  tj) 

and  the  parameter  b  is  a  compression  parameta*  [References  13, 16].  The  Superbee  limiter  is 
due  to  Roe  and  is: 


Lf  »>)  = 
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where 


cmplim(x,y)  =  sign(x)  max{0,  inin[lxl,  P  y  sign(x)],  min[P  Ixl.y  sign(x)]}  (25) 


The  compression  parameter  P  in  Equation  (25)  is  stated  by  Sweby  [Reference  17]  to  be  in  the 
range  l^P:S2(P  =  2is  typically  used).  The  Van  Leer  limiter  is: 

Lf  (€.  m)  =  vl  (26) 

where 


vl  (x,y) 


xy  +  Ixyl 
X  +  y 


(27) 


Note  in  Equation  (19)  that  although  tiie  numerical  flux  at  cell  face  i+1/2  is  being  computed, 
metrics  and  Roe  variables  at  cell  faces  i-1/2  and  i-t-3/2  are  involved.  The  additional  terms  added 
to  the  first-order  numerical  flux  in  Equation  (19)  to  achieve  higher-order  accuracy  nught  be 
viewed  as  an  extrapolation  of  flux  differences,  as  opposed  to  the  extrapolation  of  dependent  vari¬ 
ables  as  in  the  MUSCL  iqiproach. 

A  third  tqiproach  to  obtaining  a  higher-order  numerical  flux,  and  the  one  used  in  this  work, 
is  one  that  falls  somewhere  between  the  dependent  variable  extrapolation  (MUSCX)  and  the  flux 
extrapolation  approaches.  Since  any  of  Equations  (8),  (9),  or  (10)  give  the  same  first-order  nu¬ 
merical  flux,  and  because  the  evaluation  of  Equation  (10)  requires  a  slightly  larger  number  of 
floating  point  operations  than  either  of  Equations  (8)  or  (9),  Equation  (8)  is  used  for  the  first-or¬ 
der  numerical  flux  along  with  certain  additional  terms  to  obtain  the  following  higher-order  nu¬ 
merical  flux 


^i+l/2  “  (Qi)li+l/2 


j-1 

[L/-(-l,l)-Lj-(3,l)]  + 


1  +tl» 

4 


[L/(1.  -  1)  -  Lj  (1,3)]| 


r® 

i+l/2 

(28) 


where  the  o^  used  in  the  limiters  are  now  defined  as 


”j,i+P/2  ^ 


i+1/2  “j.i+P/2 


(29) 


where 
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■’i.i-l/i  “  ‘“./2  • 


(30a) 


"ii-n/a  =  l®,/2  •  (Qi+i  -  Qi)  (SOW 

Oii+a/a  =  l®,/2  •  (Qi+2  -  Qi+i)  (3(te) 

The  limiters  retain  the  same  de^tion.  However,  when  using  Equations  (29)  and  (30)  in  place 
of  Equation  (20),  the  Superbee  and  Van  Leer  limiters  reduce  to 

L*  (€,  m)  =  L*  (m,  €)  (31) 

and  the  scheme  is  independent  of  tp  when  these  limiters  are  used.  This  would  then  appear  to  be 
like  a  second-order  Fromm  scheme  which  occurs  for  =  0.  Both  second-  and  third-order  ac¬ 
curacy  can  still  be  obtained  with  the  minmod  limiter. 

Results  obtained  to  date  using  Equation  (28)  have  been  superior  to  the  results  obtained  using 
Equation  (19).  Another  advantage  of  Equation  (28)  over  Equation  (19)  is  that  the  operation 
count  is  less.  With  regard  to  limiters,  solutions  have  been  obtained  using  each  of  the  three  limit¬ 
ers;  minmod.  Superbee,  and  Van  Leer.  The  solutions  differ,  but  for  many  problems  the  solutions 
are  reasonably  close.  For  the  most  part.  Van  Leer  and  second-  and  third-order  minmod  give 
about  the  same  results.  Superbee  is  a  more  compressive  limiter  as  it  was  evidently  designed  to 
be.  The  biggest  difference  in  the  limiters  has  to  do  with  convergence.  Van  Leer  has  always  giv¬ 
en  the  best  convergence  rate  and  Superbee  usually  gives  the  worst  Minmod  has  shown  tenden¬ 
cies  to  go  into  limit  cycles.  (Considering  both  quality  of  results  and  convogence  rate,  Van  Leer 
is  presently  the  suggested  limitet  In  view  of  the  property  of  the  Van  Leer  limits:  expressed  by 
Equation  (31),  the  second-order  numerical  flux  at  a  cell  face  can  be  written 

n 

fi+l/2  “  tf  (Qi)]i+t/2  +  X  1/2 

1  ^  -  (8 
+  1.  *)- Li  (1.3)1  r® 

j-1 

where  Lj^  (€,  m)  is  given  by  Equation  (26),  or  Equation  (24)  for  that  matter,  since  Superbee  sat¬ 
isfies  Equatitm  (31)  and  o  and  a  are  given  by  Equations  (29)  and  (30). 


IS 


SECTION  IV 


NUMERICAL  SOLUTION  SCHEMES 

In  this  cell-centered  finite  volume  formulation  the  method  used  to  obtain  the  numerical  flux 
at  a  cell  face  has,  of  course,  a  strong  influence  on  the  quality  of  the  numerical  solution.  The 
method  used  for  the  numerical  flux  at  a  cell  face  also  has  an  influence  on  the  choice  of  the  nu¬ 
merical  method  used  to  solve  the  resulting  system  of  algebraic  equations.  For  example,  the  use 
of  a  flux  vector  split  scheme  [Reference  18]  for  obtaining  the  numerical  flux  at  a  cell  face  leads 
naturally  to  a  solution  matrix  composed  of  elements  corresponding  to  the  Jacobian  of  the  flux 
vector  represented  by  only  positive  eigenvalues  and  the  Jacobian  of  the  flux  vector  represented 
by  only  negative  eigenvalues.  The  solution  matrix  resulting  from  this  flux  vector  split  formula¬ 
tion  leads  more  or  less  naturally  to  a  factored  LU  type  solution  scheme  [Reference  19]  that  can 
be  coded  into  a  rather  efficient  algorithm.  This  factored  LU  type  scheme  was  put  forth  by  Bun- 
ing  and  Steger  [Reference  20]  for  the  solution  of  a  two-dimensional  finite  difference  formula¬ 
tion  of  the  equations. 

During  the  course  of  this  research  various  implicit  schemes  were  developed  and  used  for 
the  numerical  solution  of  the  three-dimensional  unsteady  Euler  and  Navier-Stokes  equations. 
The  purpose  of  this  Section  is  to  present  these  implicit  methods  and  illustrate  how  each  was  de¬ 
veloped  in  succession.  This  is  carried  out  by  first  reviewing  Newton’s  method  for  the  solution  of 
nonlinear  systems  of  equations  because  this  is  the  basic  form  in  which  the  equations  are  cast  in 
order  to  gain  generality  with  regard  to  the  solution  of  the  unsteady  equations  so  that  the  equa¬ 
tions  can  be  converged  at  each  time  step  if  desired. 

1.  NEWTON’S  METHOD 

Consider  the  system  of  nonlinear  equations  written  in  the  general  form 

Fi(XiX2....,X„) 

F2(XiX2....,X„) 

Fn(XiX2,.,.,X„j 

These  equations  could,  for  example,  be  the  three-dimensional  unsteady  Euler  equations  where 
nasS  and  Fi  through  Fs  are  the  equations  for  mass,  momentum,  and  energy,  and  components  xi 
through  X5  are  the  conserved  variables  of  density,  momentum,  and  energy.  Equations  (33)  could 
also  be  a  system  of  nonlinear  algebraic  equations  resulting  from  the  discretization  of  the  three- 
dimensional  unsteady  Euler  equations  on  a  computational  grid.  The  number  of  equations  and 


0 

0 


=  0 


(33) 
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dependent  variables  can  then  be  extremely  large  due  to  the  fact  that  there  are  five  equations  and 
five  unknown  dependent  variables  for  every  cell  or  node  point  in  the  computational  domain.  Of 
particular  interest  here  is  the  latter  case,  where  a  solution  is  required  for  the  large  number  of 
nonlinear  algebraic  equations  that  result  from  a  finite  volume  discretization  of  the  unsteady 
three-dimensional  Euler  or  Navier-Stokes  equations. 

Let  F  be  a  vector  function  of  the  coordinate  functions  Fi,  F2, .....  Fn  where  xi,  X2, ....  Xn  are 
coordinates  of  the  vector  x.  Then  the  system  of  equations  given  by  Equations  (33)  can  be  writ¬ 
ten  as 


F(x)  =  0 


(34) 


Newton’s  method  for  the  vector  function  F(x)  can  be  written  (see,  for  example,  Ortega  and 
Rheinboldt  [Reference  21])  as 

^m+i  ^  _  (F'(x"‘))"^  F(x"*) 

where  m  =  1,2,3,...  and  F'(x)  is  the  Jacobian  matrix  of  the  vector  F(x)  and  is  given  by 


F'(x)  = 


^lll(x) 

ai2(x)  . . . 

ainW 

a2i(x) 

• 

• 

a22(x)  . . . 

• 

• 

a2n(x) 

• 

• 

• 

• 

ani(x) 

• 

• 

an2W  . . . 

• 

• 

ann(^) 

(35) 


(36) 


where 


_  aFi(x) 


=  ax. 


(37) 


It  is  usually  impractical  to  obtain  the  inverse  of  the  matrix  operator  F'(x).  A  more  practical  way 
of  writing  Newton’s  method  for  obtaining  numerical  solutions  is 


F'  (x“)  (x*"-^^  -  x"*)  =  -  F(x“) 


(38) 


Two  good  references  on  Newton’s  method  for  nonlinear  systems  of  equations  are  the  books 
by  Ortega  and  Rheinboldt  [Reference  21]  and  Dennis  and  Schnabel  [Reference  22].  By  taking 
advantage  of  these  (and  other)  worics  on  Newton’s  method,  it  is  a  rather  simple  matter  to  state 
the  problem.  However,  it  is  an  entirely  different  matter  to  numerically  solve  the  resulting  system 
of  equations. 
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2.  NEWTON’S  METHOD  COMPARED  TO  NORMAL  LINEARIZATION 


It  is  interesting  to  compare  the  equations  resulting  from  the  application  of  Newton’s  method 
with  the  equations  resulting  fix)m  what  is  refened  to  here  as  normal  linearization  as  introduced  in 
References  [23  and  24].  For  illustration  purposes,  consider  the  first-order  nonlinear  hyperbolic 
system  that  has  been  transformed  from  x,  t  space  to  t  space 


^  = 
^  d| 


(39) 


Discretizing  this  equation  in  the  finite  volume  sense  with  A|  =  1,  and  using  a  Hrst-order  back¬ 
ward  difference  in  time,  the  resulting  system  of  difference  equations  that  must  be  solved  can  be 
written  as 


Qn  +  l  _  QJ. 

— i - -  +  f. 

At 


(40) 


or 


-  Q" 

- !-  -1-  6i  f  (Q"+i)  =  0  •  (41) 

The  normal  method  introduced  in  References  [23  and  24]  to  linearize  Equation  (41)  would 
be  to  linearize  the  vector  function  f(Q)  to  obtain 

-  Q!* 

*  »  -I-  6i  (f  (Q")  +  f '  (Q")  (Q"+^  -  Q"))  =  0  (42) 

wheref  '(Q)  is  the  Jacobian  matrix  of  the  vector  function  f  (Q).  In  the  CFD  literature  Equation 
(42)  is  usually  written 

a  +  At  6i  A  •)  (Q”+i  -  Q")  =  -  At  Sj  f  (Q")  (43) 

where 

A  =  f '  (Q")  (44) 

On  the  other  hand,  to  solve  Equation  (41)  using  Newton’s  method,  the  left  hand  side  of 
Equation  (41)  is  simply  tire  vector  function  F  given  by  Equation  (34)  and  the  vector  x  in  Equa¬ 
tion  (34)  is  the  dependent  variable  vector  Q  at  time  level  n+1,  Le.  x  =  Newton’s  method  is 
now  formulated,  and  the  system  of  difference  equations  that  must  be  solved  is  given  by  Equation 
(38);  or,  in  the  nomenclature  of  this  section,  the  equations  that  must  be  solved  are  given  by 
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a  +  At  6i  A  •)  (Q“+l.ni+l  _  Qn  +  l.mj  =  _ 


Qn+l.m  _  Qn 


+  6i  f  (Q“+i-®) 


where  in  this  case 

A  =  f  '(Q^+i-”)  (46) 

Note  that  if  the  initial  qjproximation  (m=l)  to  0°^^  of  Equation  (45)  is  taken  as 

QH  +  l.l  _  Qfl  (47) 

then  the  first  iteration  of  Equation  (45)  is  the  same  as  the  solution  (^^  of  Equation  (43).  An 
obvious  difference  between  the  normal  linearization  resulting  in  Equation  (43),  and  Newton’s 
method  resulting  in  Equation  (45),  is  that  the  time  derivative  is  contained  in  the  residual  term 
(right  hand  side  of  the  equation)  in  Equation  (45),  whereas,  the  residual  term  in  Equation  (43) 
does  not  contain  the  time  derivative.  For  steady  state  problems  where  time  could  be  an  iteration 
parametffl-,  continued  itn'ation  of  Equation  (43)  would  presumably  lead  to 


6i  f  (Q")  -  0  (49) 

On  the  other  hand,  for  unsteady  problems,  a  solution  of  Equation  (43)  does  not  insme  that 

f(Q“)  =  0  (50) 

whereas,  a  converged  solution  of  Equation  (45)  does  insure  that  the  unsteady  equation.  Equation 
(41),  is  satisfied. 


3.  SOLUTION  SCHEMES 

Of  particular  interest  here  is  the  numerical  solution  of  the  so-called  upwind  formulation  of 
the  equations  for  three-dimensional  time-dependent  problems.  An  implicit  diree-factor  scheme 
of  the  three-dimensional  upwind  equations  requiring  three  block  tridiagonal  solutions  similar  to 
the  scheme  of  References  [23  and  24]  was  shown  by  Anderson  [Reference  25]  to  be  conditional¬ 
ly  stable  with  a  maximum  CFL  number  of  wder  10.  Anderson  [Reference  25]  also  investigated 
the  stability  properties  of  a  two-factor  LU  scheme  considered  in  Reference  19,  and  found  it  to 
have  improved  stability  properties  compared  to  the  three-factor  scheme.  This  was  also  the  con¬ 
clusion  of  a  more  detailed  analysis  performed  by  Mansfield  [Reference  26].  The  two-factor 
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scheme  (usually  referred  to  as  the  two-pass  scheme)  is  a  workhorse  scheme  used  to  solve  nu¬ 
merous  steady  and  unsteady  flow  problems  about  rather  complex  three-dimensional  configura¬ 
tions.  Although  the  two-pass  scheme  has  been  described  several  times,  it  will  be  reviewed  brief¬ 
ly  again,  because  a  new  scheme  will  be  presented  that  involves  modifications  of  the  two-pass 
scheme. 


a.  TWO-PASS  SCHEME 

Considra-  the  three-dimensional  extension  of  Equation  (39)  written  as 

aQ  .  ^  .  dg(^  ,  ah(Q)  ^  0 
at  dti  dC 

The  finite  volume  discretization  of  Equation  (51)  analogous  to  Equation  (41)  is 

-QTn 


(51) 


‘ijjc 


\n  + 1\  _ 


At 


+  6i  f  (Q“-^i)  +  6j  g((5”+0  +  6k  h(Q"+') 


0 


(52) 


The  flux  vector  split  form  of  the  equations  (see,  for  example.  Reference  19)  can  be  written  as 

<3”.V  -  Q"  I, 

+  6i(f+«3“+^)  +  f-(Q"-"^)]  +  6j[g+(Q"-^i)  +  g-(Q"-'‘)] 


+  6k[h+(Q"'^»)  +  h-(Q"+^)]  =  0 


(53) 


Newton’s  method  for  this  three  spatial  dimension  problem  that  is  analogous  to  Equation  (45)  is 
[I  +  At(6{  A^  +  6j  A*  6j  -I-  6j  B*  +  6^  G  -^6^0  )] 


where 


^Qn+l,m+l  _  Qn+l.inj  _  _  ^^Rn+l,m 


±  ^  af*(Q°-»-^’«") 

^  acjn+ljn 


(54) 


±  _  3gi=(QnH-ljn) 


B*  = 


aQ" 


+  l4n 


|-»n+l,  m  _ 

>«+l,m 

^  At 


±  ^  ah=fc(Q"^-^»”) 

a(yi+ljn 

+  6i[f*-((2“-*'^-"‘)  +  f-(Q"+^-‘")] 


+  6j[g+(Q"+*-"*)  +  g"(Q"'^^*”)l  +  6k[h+((5"‘^^’'")  +  h"(Q"+*-'")] 
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Defining 


_  Qn+ljn+l  _  Qn+Ijn 
6  =  6iA+  +  6jB+  + 

6  M~  =  SjA"  +  6jB-  +  6kC" 

Equation  (54)  can  be  written 

a  +  Ax6M‘^  +  At8M")X“  =  -  AtR“+^*“ 
The  two-pass  (LU)  scheme  for  solving  tfiis  system  is 

a  +  AtaM"^)  a  +  At5M“)X“  =  -  AtR“+^-“ 
which  can  be  solved  in  the  following  steps 

a  +  At8M^)Xi*”  =  -AtR“+^-“ 
a  +  Ax8M~)X2*®  =  X^-® 

Qn+l,m+l  _  Qn-fl.m  _  x2,m 


(55) 


(56) 


(57) 


(58) 


b.  MODIFIED  TWO-^*ASS  SCHEME 


To  obtain  the  modified  two-pass  scheme,  first  expand  Equation  (56)  rather  than  factor  it  to 
obtain 


X®  +  Ax(M.^  X“  -  M.^  ,  X“  ,)  +  Ax(M. . ,  X“  ,  -  M.  X”)  =  -  AxR?+i'® 

1  II  1-1  1-1  1+1  *  1  » 


(59) 


(The  n(»nenclatuie  gets  a  bit  sticky.  The  single  subscript  i  represents  the  point  in  three-dimen- 
sicmal  computational  space  corresponding  to  the  point  iJJL  Subscript  i+1  corresponds  to  i+1  JJq 
y+l  Jq  or  i  One  simple  way  of  following  tiie  argument  is  to  think  of  this  as  a  one-dimen¬ 

sional  problem.  Remember,  however,  that  the  matrix  M  is  the  sum  of  the  flux  Jacobians  in  each 
of  the  three  computatimial  diiectitms,  and  subscript  i+1  can  represent  any  of  the  tiiree  adjoining 
cells  tfiat  are  in  die  positive  computational  directitm  of  the  cell  ij  Jq  and  the  subscript  i-1  can 
iq;nesent  any  of  the  three  adjoining  cells  that  are  in  the  negative  computational  direction  of  the 
cell  ij  Jk.)  Dividing  Equation  (59)  by  the  time  step.  Ax,  one  obtains 

(  Xi  +  M+  -  M-)  Xf  -  M,t,  X?L,  +  M- ,  =  -  R”*'"  (60) 

or 
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(61) 


Xf_,  +  M- ,  X”  ,  =  - 

where 

D  =  +  M"^  -  M"  (62) 

There  are  at  least  two  ways  of  looking  at  the  modified  two-pass  scheme.  One  is  to  view  the 
method  as  a  factored  scheme,  and  another  is  to  view  the  method  as  a  relaxation  scheme.  Both 
points  of  view  will  be  considered  next 


(1)  MODIFIED  TWO-PASS  AS  A  FACTORED  SCHEME 


Assuming  D  to  be  nonsingular.  Equation  (61)  can  be  written  as 

xr-DfV-ixr.i  +  =  -d.-»r;‘^^*'" 

Equation  (63)  can  be  factored  just  as  the  two-pass  (LU)  scheme  to  obtain 

a  -  DriM.ti)(I  +  Df'Mi+OX”  =  -  Di-^R”-*-^*™ 

Equation  (64)  can  also  be  solved  in  two  passes  (two  steps)  by 
« 

(Dj  -  Mitj)X'*“  =  - 

(Dj  +  Mi+i)X2^  =  DpC^-*" 
or 


DiX^*”  +  Mi+i  X^  =  0 


vH+ljn+l 


Ijn 


(63) 


(64) 


(65) 


(2)  MODIFIED  TWO-PASS  AS  A  RELAXATION  SCHEME 


(Consider  solving  Equatimi  (61)  using  the  Gauss-Seidel  method.  In  Equation  (61)  D  is  a 
block  diagonal  matrix,  M*^  is  a  block  lower  triangular  matrix  with  zeros  on  the  diagonal,  and  M~ 
is  a  block  upper  triangular  matrix  with  zeros  on  the  diagonal.  Considning  X^«°*  to  be  initially 
zero,  the  Gauss-Seidel  method  is 


j^n+ljn 

i 


(66) 


After  completing  this  forward  pass  the  X^>°*  are  known,  and  a  backward  pass  can  tiien  be  carried 
out  to  complete  a  symmetric  Gauss-Seidel  solution  by 


“^i-i 


+ 


Mi+i  X 


2jii 
i-f  1 


-  R 


n  +  ljn 


(67) 
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Some  computational  conveniences  can  be  realized  by  noting  the  following.  Using  Equation 
(66)  for  the  vector  M-t.  one  can  write  Equation  (67)  as 


Dpcf"  +  Mf+i  X^,  =  DpC,'" 
SO  that  in  the  backward  pass  one  can  actually  solve  for  ,  where 


(68) 


(69) 


just  as  in  Equations  (65).  This  reduces  the  number  of  matrix-vector  multiplications  that  would 
otherwise  need  to  be  carried  out  The  symmetric  Gauss-Seidel  relaxation  scheme  is  then  given 
by  Equations  (66),  (68),  and  (69). 

The  end  result  of  this  is  tiiat  both  points  of  view  (that  is,  whetho-  the  modified  two-i)ass 
scheme  is  omsidered  a  factored  scheme  ot  a  symmetric  Gauss-Seidel  relaxation  scheme  without 
residual  updating)  are  the  same. 

It  was  recently  discovered  that  the  Russian  Samarskii  [Reference  27]  put  forth  a  scheme  in 
1964  that  is  similar  to  this  modified  two-pass  scheme,  and  it  is  referred  to  in  Refoence  [27]  as 
the  alternate-triangular  method.  The  analysis  of  the  alternate-triangular  method  presented  in 
Reference  [27]  requires  rather  strict  properties  of  the  solution  operators,  and  the  present  solution 
operators  do  not  satisfy  these  properties.  Application  and  analysis  of  the  alternate-triangular 
method  to  elliptic  difference  equations  is  presented  in  Chapter  10  of  the  Russian  book  by  Sa¬ 
marskii  and  Nikolaev  [Reference  27]. 


c.  N-^ASS  SCHEME 


It  was  shown  above  that  the  Modified  IVo-Pass  Scheme  could  be  viewed  as  a  forward  and 
backward  pass  of  a  symmetric  Gauss-Seidel  scheme  if  die  initial  guess  for  was  taken  as 
zero  and  the  solution  process  was  terminated  aftmr  one  forward  and  one  backward  pass.  By  as¬ 
suming  an  initial  guess,  denoted  ,  the  first  pass  of  this  relaxation  scheme  analogous  to 

Equation  (66)  can  be  written  as 


DiXj"  -  M|t,  x|f,  +  M,-;,  x|^;",  -  -  Rf*’"  (70) 

A  backward  pass  analogmis  to  Equation  (67)  can  be  written  as 

DiX?"  -  Mjt,  X|f,  +  M‘ ,  X?f,  -  -  r“'"  (71) 

This  process  can,  of  course,  be  repeated,  and  is  done  so  to  obtain  the  N-Pass  Scheme  given  by 

DjXf”*-”  -  Mill  +  M.’ 1  (72) 
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and 


DiX" 
1  1 


i-l 


+  M 


i+1 


yPJO  _ 
-^i+1  “ 


_  j^n+ljn 
I 


(73) 


where  p  =  2,4,6.  •.2N. 

From  the  indexing  used  above  a  better  terminology  for  this  method  might  be  to  call  this  a 
2N-Pass  Scheme,  or  perhaps,  an  N-Cyde  Scheme  if  one  forward  and  one  backward  pass 
through  the  computational  domain  were  considered  to  constitute  one  cycle.  Regardless  of  what 
constitutes  a  pass  ex'  a  cycle,  notice  that  this  scheme,  given  by  Equations  (72)  and  (73),  is  simply 
symmetric  Gauss-Seidel. 


4.  DISCRETIZED  NEWTON  METHOD 


All  terms,  by  conventional  standard,  appearing  in  the  Equation  (54)  should  result  from  a 
single  flux  formulation.  Barth  [Reference  28]  encountered  difficulty  with  the  fmmal  lineariza¬ 
tion  of  the  Roe  flux  functions.  He  concluded  that  although  superim*  convergence  rate  could  be 
obtained  using  the  fmmal  linearization,  the  computational  e^nse  made  it  unattractive.  Good 
results  over  the  past  few  years  have  been  obtained  by  a  hybrid  Roe  scheme,  which  utilized  the 
true  positive  and  negative  Jacobians  derived  from  flux-vector  splitting  on  the  left-hand-side  of 
Equation  (54)  and  the  residual  term  R  on  the  right-hand-side  of  Equation  (54)  from  flux-differ¬ 
ence  splitting.  However,  one  could  say  that  the  hybrid  Roe  scheme  is  inconsistent  in  that  the  in- 
viscid  flux  Jacobians  used  in  the  solution  operator  emrespond  to  a  flux-vecta-  splitting  scheme, 
while  the  residual  vectw  is  computed  using  the  Roe  scheme.  A  method  which  resolves  this  in¬ 
consistency  when  the  Roe  scheme  is  used  to  evaluate  the  inviscid  flux  vectors,  is  given  in  Refer- 
ence[29].  This  iq>proach  will  work  in  principle  for  any  scheme  for  which  the  true  Jacobians  are 
difficult  to  derive  (x  {q)proximate.  In  this  procedure,  the  inviscid  flux  Jacobian  matrices  are 
computed  numerically  by  discretization  of  the  inviscid  flux  vector.  Ortega  and  Rheinboldt  [Ref¬ 
erence  21]  refer  to  such  a  mediod  as  a  discretized  Newton  iteration.  Various  finite-difference 
^proximations  have  been  suggested  in  the  numerical  analysis  literature.  A  simple  and  straight¬ 
forward  iq>proximation  of  the  elements  of  the  Jacobian  is  to  replace  Equation  (37)  with 


Fi  (  X  hej  )  -  Fi  (X) 

*ij  h 


(74) 


where  ^  is  the  jth  unit  vector.  Dennis  and  Schnabel  [Reference  22]  point  out  that  this  is  the 
same  as  approximating  the  jth  column  of  the  Jacobian  by 


jth  Column  of  F  ’  (x) 


F  (  X  +  hej  )  -  F(x) 
h 


(75) 


24 


Dennis  and  Schnabel  [Reference  22]  also  point  out  that  if  a  sequence  (hm)  is  used  for  the  step 
size  h,  and  if  this  sequence  is  properly  chosen,  then  the  quadratic  convergence  property  of  New¬ 
ton’s  method  is  retained  and  Newton’s  method  using  finite  differences  is  “vinually  indistinguish¬ 
able”  from  Newton’s  method  using  analytic  derivatives.  However,  a  straight  Newton’s  method 
is  not  used  here  due  to  the  magnitude  of  the  problem  in  three  dimensions,  so  since  quadratic  con¬ 
vergence  is  lost  anyway  it  did  not  seem  finitful  at  this  stage  of  research  to  invest  significant  op¬ 
eration  count  in  sequencing  h.  In  fact,  a  constant  h  of  about  half  the  reliable  digits  of  the  ma¬ 
chine  has  worked  as  well  thus  far  as  using  a  variable  step  size.  For  example,  on  a  Cray  Y-MP  in 
single  precision  (64  bits),  constant  step  sizes  of  h  =  10“^and  h  =  10~*gave  virtually  indistin¬ 
guishable  results.  Therefore,  for  a  constant  step  size,  the  present  suggestion  is 


h  =>/machine  epsilon 


(76) 


It  should  be  noted  that  this  iq)proach  of  using  a  constant  step  size  may  not  be  appropriate  for 
situations  where  there  may  be  large  variations  in  the  magnitude  of  the  elements  of  the  dependent 
variable  vector,  although  it  has  worked  thus  far  for  all  cases  considered.  The  application  of  the 
above  mentioned  procedure  has  been  extended  to  evaluate  the  viscous  flux  Jacobian  matrices  as 
well.  The  viscous  flux  Jacobian  matrices  are  computed  numerically  by  discretization  of  the  vis¬ 
cous  flux  vector. 
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SECTION  V 


SOFTWARE  OPTIMIZATION 

1.  QUASI-BLOCK-TRIDIAGONAL  SOLUTION  MATRIX  STRUCTURE 

This  subsection  is  dedicated  to  the  analysis  of  the  matrix  form  involved  with  the  system 
MAQ  s  R,  i.e.  (Ax  »  b>.  Before  discussing  a  couple  of  solution  methods  fOT  solving  this  kind  of 
a  system,  the  general  appearance  of  the  matrices  will  be  considered  to  see  what  can  be  done  to 
simplify  the  problem.  In  this  subsection,  the  structure  of  the  matrices  which  result  from  solving 
the  fluid  dynamic  equations  in  one,  two,  and  three  dimensions  is  presented.  E)o  not  be  con¬ 
cerned  with  the  actual  entries  in  the  matrices,  as  one  can  easily  go  back  and  see  where  each  indi¬ 
vidual  value  came  from.  The  goal  is  merely  to  get  some  pictures  in  mind  to  set  the  stage  for  tak¬ 
ing  advantage  of  particular  characteristics  of  the  system  matrix  in  the  solution  methods  which 
follow. 

Fot  the  one-dimensional  case,  the  form  of  the  M  matrix  (referred  to  here  as  the  system  ma¬ 
trix)  is  very  simple,  and  is  shown  in  the  following  figure: 


Figure  1.  One-Dimensional  Matrix  Structure 

In  tiiis  case,  all  the  entries  (referred  to  here  as  point-blocks)  are  square  and  of  order  three. 
This  is  what  is  refened  to  as  a  “block-tridiagonal”  matrix  since  there  is  a  main  diagonal  of 
point-blocks,  one  super-diagonal,  and  one  sub-diagonal,  and  the  point-blocks  which  constimte 
diem  are  all  square  and  of  the  same  size.  Obviously,  Gaussian  reduction,  a  block  version  of 
Thomas’  algorithm  approximate  factorization  are  straight  forward  approaches  to  solving  this 
matrix. 

Extending  from  this  one-dimensional  case  to  see  the  two-dimensional  system  matrix  struc¬ 
ture,  flux  Jacobians  in  the  second  dimension  are  arkied  to  the  matrix  structure.  In  Hgure  2,  all 
point-blocks  are  of  order  four  (consovation  of  mass,  two  components  for  conservation  of  mo¬ 
mentum,  and  conservation  of  energy).  The  system  is  now  banded  with  Eve  bands  (refened  to 
here  as  block-pentadiagonal). 
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Figure  2.  IWo-Dimeiisioiial  S^tem  Matrix  Structure 


Finally,  for  the  three-dimensional  case,  thoe  are  now  seven  point-blocks  of  (M’der  five  to 
place  in  the  M  matrix.  Extending  this  system  matrix  structure  of  Figure  2  one  dimension  further, 
it  can  be  shown  that  the  m^tr  :  will  be  of  the  following  form: 


Figures.  Three-Dimensional  System  Matrix  Structure 

In  this  case  the  system  is  block-septadiagonal,  since  there  are  three  block  diagonals  both 
above  and  below  the  main  block  diagonal.  Neither  the  2D  or  3D  systems  {q)pear  to  have  a  very 
“clean”  way  of  solving  them,  and  the  question  of  stwage  immediately  comes  to  mind,  since 
there  are  a  tremendous  number  of  zeros  which  must  be  kept.  Even  for  a  relatively  small  2D  ge¬ 
ometry,  it  is  not  unusual  to  find  a  system  with  a  few  hundred  block  rows  and  coliunns.  That  im- 


plies  a  few  megawords  of  storage  to  hold  the  entire  matrix,  and  perhaps  less  than  1%  of  that 
would  (initially)  be  non-zero  values. 

The  overriding  question  here  is  “How  can  solution  concurrency  be  extracted  from  this  kind 
of  system  without  a  monumental  memory  or  solution  convergence  penalty?”.  Clearly,  approxi¬ 
mate  factorization  is  attractive  with  regard  to  reducing  the  storage,  but  are  there  other  methods 
which  may  take  advantage  of  the  banded  nature  of  the  system,  without  a  huge  cost  penalty  in¬ 
volved  in  storing  all  of  the  zeros?  The  researdi  approach  taken  here  involves  analyzing  the  sys¬ 
tem,  by  observing  the  potential  for  making  it  “quasi-block-tridiagonal”  as  shall  be  discussed  be¬ 
low. 

Obviously,  when  dealing  with  the  one-dimensional  case,  the  system  matrix  is  a  straight  for¬ 
ward  block-tridiagonal  system.  Taking  into  account  boundary  conditions  (explicit),  the  system 
matrix  will  take  the  form: 


Figure  4.  One-Dimensional  System  Matrix  Structure  with  Boundary  Conditions 

The  “I”  point-blocks  are  identity  matrices  which  simply  set  the  boimdary  values  to  that  de¬ 
termined  explicitly  and  stored  in  the  b  vector.  Of  course,  it  is  possible  to  have  implicit  boundary 
values  thereby  introducing  additional  point-blocks  or  altering  the  entries  in  point-blocks  shown 
in  this  matrix.  The  simplest  case  (explicit  boundary  conditions)  will  be  treated  here  since  the 
principle  concern  will  be  how  to  find  and  utilize  concurrency  in  these  types  of  systems,  and  the 
boundary  assiunptions  should  not  have  an  effect  on  these  methods. 

Extending  to  two-dimensional  and  three-dimensional  systems,  however,  requires  some 
work  to  get  things  looking  like  a  “block-tridiagonal  system”.  The  simplest  method  is  to  take  the 
M  matrix  with  the  boundary  values  included,  and  simply  “draw  lines”  horizontally  and  vertically 
through  the  matrix,  partitioning  it  into  a  peculiar  looking  block-tridiagonal  structure.  By  doing 
this,  groups  of  point-blocks  (which  are  matrices)  form  what  wUl  be  referred  to  here  as  matrix 
blocks. 
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Now  a  moment  will  be  taken  to  look  in  some  detail  at  what  it  really  takes  to  partition  a  3D 
problem  from  which  the  M  matrix  is  built  First  look  at  the  very  simple  3D  space,  which  is 
shown  below  in  Hgure  5.  The  space  is  dimensioned  asSx4x4(ixjxk).  In  addition,  it  has 
been  partitioned  into  four  planes  running  in  the  ij-plane.  The  reason  for  this  is  because  of  the 
type  of  software  implementation  which  is  typically  used  for  assembling  the  M  matrix. 


Figure  S.  Three-Dimensional  Computational  Space 


The  dots  on  the  surfaces  (many  of  them  are  hidden  behind  the  front-Hiiost  plane)  are  the 
points  (cells)  where  the  fluid  solution  is  sought  Thiditionally,  most  3D  curvilinear  spaces  - 
associated  with  the  solvers  developed  in  association  with  this  study  are  set  up  so  that  the  max  j 
and  k  indices  are  small  in  relation  to  the  i  index.  It  is  common  to  establish  the  point  distribution 
with  a  i » j  >  k  hioarchy  in  mind.  An  example,  in  “pseudo-code”,  of  the  way  these  points 
(equations)  are  assembled  for  processing  is  shown  below. 

for  k=l  to  4 
for  j=l  to  4 
for  1=1  to  5 

Position  this  point’s  equation  as  the  next  row  of  the  M  matrix, 
end 
end 
end 
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The  M  matrix  which  results  from  this  type  of  arrangement  is  shown  in  Figiue  6.  This  is  the 
system  which  must  be  “block-tridiagonalized”.  In  this  case,  the  block-tridiagonal  structure  is 
found  by  simply  drawing  vertical  and  horizontal  lines  through  the  matrix  as  shown  in  the  figure. 
The  system  has  been  partitioned,  and  the  block-tridiagonal  structure  should  be  obvious.  There 
are  several  other  ways,  obviously,  to  arrange  the  three  loops,  but  all  of  them  result  in  a  similar 
block-tridiagonal  arrangement 


30 


Hgure  6.  lypical  Thre^Dimensional  System  Matrix  Structure  (Tridiagonalized) 
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Next,  concepts  developed  to  extract  natural  concurrency  Hner  than  what  has  been  viewed  for 
traditional  solution  methods  will  be  presented.  The  purpose  of  searching  for  fine-grained  con¬ 
currency  has  been  to  provide  opportunities  for  additional  vectorization,  such  as  what  would  be 
done  on  any  Cray  vector  processor.  The  central  theme  here  involves  the  use  of  the  concept  of 
“diagonal  plane  processing”  for  matrix  ordering  [References  5, 18]. 

The  technique  is  to  take  a  2D  or  3D  space,  and  chose  the  ordering  of  points  in  the  M  matrix 
in  such  a  way  as  to  make  the  problem  naturally  appear  as  a  quasi-block-tridiagonal  system. 

This  all  boils  down  to  traversing  through  a  3D  space  on  a  diagor  '  plane  in  which  none  of  the 
points  in  the  plane  are  direcdy  effecting  any  other.  In  the  2D  case,  this  is  analogous  to  traversing 
a  diagonal  line,  and  grouping  the  points  falling  on  that  line  for  concurrent  processing.  Here  a 
brief  look  will  be  taken  at  each  case. 

In  the  2D  case,  consider  only  the  i  and  j  directions.  In  Figure  7  below,  consider  a  very  small 
2D  space  that  has  been  partitioned  into  a  8  x  4  grid  of  interior  points  (cells).  In  addition,  the 
boundary  conditions  (haloed  points)  are  included,  resulting  in  a  10  x  6  system,  since  the  first  and 
last  rows  and  columns  represent  the  boundary  conditions.  Also  shown  are  every  other  diagonal 
line  cutting  the  domain  representing  about  half  of  the  IS  cutting  lines. 


Figure  7.  TWo— Dimensional  Computational  Space  with  Diagonal  Line  Grouping 

It  can  be  shown,  since  all  communication  between  points  (from  a  typical  difference  stencil) 
is  done  in  the  i  and  j  directions,  no  points  lying  on  the  same  diagonal  line  are  in  direct  commu¬ 
nication.  As  a  result  of  this,  when  looking  at  adjacent  lines,  the  problem  has  reduced,  in  essence, 
to  a  one-dimensional  problem.  The  result  of  this  shows  up  in  the  M  matrix  organization,  as 
shown  in  Hgure  8.  Again  the  identity  point-blocks  represent  explicit  boundary  conditions.  In 
this  case,  one  takes  the  points  from  the  top  of  each  diagonal  line,  and  works  down,  starting  from 
line  1  and  working  through  line  IS.  The  blocks  on  the  diagonal  do  not  appear  to  be  square.  This 
is  because  the  characters  are  taller  than  they  are  wide.  However,  the  center  blocks  are  all  square, 
as  is  the  overall  matrix. 
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Figure  8.  Iwo-Dimensional  System  Matrix  Usmg  Diagonal  Line  Ordenng 


This  particular  form  is  cmnmonly  referred  to  as  **quasi-block-tridiagonal”,  or  sometimes 
**block-aidiagonal  with  unequal  matnx-block  sizes”.  Systems  which  occur  in  these  forms  are 
common.  In  addition,  the  resulting  banded  system  shown  in  Hgure  8  has  been  “blocked”  to 
show  die  block-tridiagonality  in  the  system  matrix. 


Now  consider  the  3D  case.  The  first  thing  needing  to  be  done  is  to  get  a  good  picture  in 
mind  of  what  the  3D  space  looks  like  when  partitioned  into  diagonal  planes.  The  picture  below. 
Figure  9,  shows  the  5  x  4  x  4  space  ( i  x  j  x  k  )  which  has  all  of  the  diagonal  planes  shown.  The 
origin  of  the  space  is  found  in  the  lower  left  hand  comer,  where  the  number  “1”  appears.  Also, 
the  diagonal  plane  intersections  with  the  computational  domain  have  been  numbered  from  1 
through  11,  with  die  first  and  last  simply  being  single  points  in  the  comers. 
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Figure  9.  Diagonal  Planes  for  5  x  4  x  4  Computational  Space 

It  may  be  a  little  helpful  to  look  at  each  diagonal  plane,  so  that  the  shape  can  be  clearly 
seen,  see  Figure  10.  Note  how  many  points  are  contained  on  die  perimeter  and  within  each  cut¬ 
ting  plane.  The  points  on  the  perimeter  could  represent  the  boundary  points  for  example.  By 
examining  these  points,  the  manner  in  which  the  boundary  values  fall  into  the  M  matrix  can  be 
seen. 


l&ll  2&10  3&9  4&8  5&7 

Figure  10.  Individual  Diagonal  Planes 
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The  resulting  matrix  form  is  only  slightly  more  complex  than  in  the  2D  case.  In  this  case, 
the  points  will  be  taken  from  consecutive  planes  1  through  11.  >^thin  each  plane  the  points  will 
be  ordered  according  to  the  following  portion  of  “pseudo-code”: 

forSUM=3tol4 
for  k=l  to  4 
forj=l  to  4 
for  i=l  to  S 

if  i+j+k  equals  SUM  then  include  the  informadon 
(equation)  for  that  point  in  the  next  row  of  the  matrix, 
end 
end 
end 
end 

Since  all  of  the  points  on  a  given  diagonal  plane  have  indices  which  add  up  to  a  constant 
(the  plane  number  plus  two),  the  above  code  will  start  with  the  first  diagonal  plane  (which  is  a 
single  point  at  die  origin,  with  constant  value  of  1-f  1+1  =  3)  and  look  for  all  points  in  the  space 
which  fall  on  each  particular  plane.  That’s  the  role  of  the  first  “for”  loop.  The  next  three  merely 
control  the  order  tiiat  the  points  are  tested.  Using  the  above  algorithm  for  sampling  the  points, 
the  full  block-tridiagonal  matrix  for  the  3D  case  is  very  similar  to  that  for  ZD,  as  shown  in  Fig¬ 
ure  11.  The  off-diagonal  matrix  blocks  are  banded  as  in  the  2D  case,  only  the  bands  are  not  as 
structured. 

It  must  be  kept  in  mind,  that  this  is  merely  tme  possible  arrangement  of  the  data  points  lying 
on  each  diagonal  plane.  Indications  are  that  vimially  all  consistent  sampling  methods  result  in 
systems  which  look  a  great  deal  like  this.  Hiis  particular  technique  of  using  diagonal  plane  or¬ 
dering  seems  to  reduce  the  size  of  the  off-diagonal  matrix-blocks  to  a  nunimum.  This  is  an  im¬ 
portant  consideration  when  dealing  with  any  form  of  solver  in  which  matrix  fill-in  will  occur. 
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Figure  11.  Three-Dimensional  System  Matrix  Structure  Using  Diagonal  Plane  Ordering 
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2.  OPTIMIZING  APPROXIMATE  FACTORIZATION 


In  the  last  subsection,  it  was  shown  that  the  system  matrix  resulting  hrom  the  three-dimen* 
sional  fluid  dynamic  equations  can  be  ordered  such  that  it  appears  as  a  quasi-block-tridiagonal 
system.  At  this  point,  a  direct  solution  technique  which  exploits  this  block  matrix  structure 
could  be  implemented  for  the  inversion  of  the  system  matrix.  Alternately,  approximate  factor¬ 
ization  (AF)  is  a  solution  method  which  has  been  advocated  by  the  authors  for  sometime  now. 
Essentially  AF  breaks  the  M  system  matrix  into  upper  and  lower  triangular  matrices  much  the 
same  as  an  LU  decomposition  (of  a  direct  solver)  would.  The  two-factor  scheme  (two-pass 
scheme  of  Section  IV)  put  forth  sometime  ago  yields  an  upper  triangular  matrix  which  includes 
all  of  the  backward  ruiming  information,  and  a  lower  triangular  matrix  which  includes  all  for¬ 
ward  running  information.  Since  the  original  equation  looks  like  MAQ  =  R,  then  the  resulting 
form  will  be  M''’M~AQ  =  R.  This  is  the  same  sort  of  equation  that  one  gets  from  an  LU  decom¬ 
position  of  a  direct  (block-tridiagonal)  solver,  and  can  be  solved  using  5x5  block  matrix  inver¬ 
sion  coupled  with  backward  (forward)  substitution.  What  is  sought  here  is  to  understand  the 
“communication”  penalty  associated  with  approximating  the  LU  decomposition  one  would  get 
from  a  direct  solver.  Initially,  consider  the  block-tridiagonal  matrix  structure  resulting  from  the 
one-dimensional  equations  as  discussed  previously.  Expanding  the  one-dimensional  equation 
for  an  arbitrary  cell  i  yields, 

(  I  +  A.+  -  Aj+i  +  Af+i  -  Af)  AQ  =  R  (77) 


where  I  is  a  3x3  identity  matrix,  the  A’s  are  3x3  flux  Jacobians  and  AQ  and  R  are  the  solution 
vectOT  and  residual,  respectively.  This  equation  is  used  for  all  cells  in  the  domain,  resulting  in  a 
system  of  equations  for  simultaneous  solution.  In  matrix  form  the  equation  obtained  is 


M  AQ  =  R 

where 


■m„ 

M„ 

0 

0 

Mu 

Ms 

Ms 

0  . 

0 

M„ 

Ms 

Mm  : 

0 

0 

M« 

Mm  : 

0 

0 

0 

Mm  : 

0 

0 

0 

0  • 

(78) 


and  AQ  and  R  are  column  vectors. 

A  standard  block-tridiagonal  LU  decomposition  results  in  the  following  matrix  structure 
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The  first  few  entries  in  the  M  matrix  are  given  by. 

. - . Mh  =  1 

'11^11  - 

I  +  Aj+  - 

- 

Af 

Mj2  —  12  “ 

M21  =  LjiUn  =  -  A+ 

M22  “  L22U22  +  L21U12  =  I  +  —  A2 

^23  =  1-22^23  “  A3 

M32  *  1-32^22  “  ■“  A^ 

M33  =  L33U33  +  L32^23  ”  I  +  A^  —  A3 
^34  ~  ^3^34  “  A4 


(79) 


(80) 


For  the  standard  two-factor  scheme,  the  qierator  of  Equation  (77)  is  approximately  factored 
as  follows, 

(  I  +  Ai+  -  Aiti)  (  1  +  Ai+i  -  Af  )  AQ  =  R 

L  U  AQ  =  R 

The  solution  procedure  for  this  two-factor  scheme  is 

LX^  =  R 
U  X2  = 

AQ  =  X^  =>  Q“+»  =  0“  -H  AQ 

The  structure  for  die  approximate  lower  block  triangular  and  upper  block  triangular  matrices  is 


•  I  +  A*  0  0  0 

■  I  -  A-  A-  0  0 

-  A*  I  +  A*  0  0 

0  I  -  A-  A'  0 

0  -  A*  I  +  A*  0  :  : 

0  0  I  -  A-  A-  :  : 

0  0  -  A^  I  A*  :  : 

0  0  0  I  -  A-  :  : 

0  0  0  -  A^  : 

0  0  0  0  :  : 

0  0  0  0  *  * 

•  •  •  • 

0  0  0  O  ' 

•  •  •  • 

•  •  •  • 

•  •  •  —  • 

(83) 


(81) 


(82) 
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which  leads  to  the  following  entries  in  the  resulting  ^proximate  M  matrix  (operator  product  ma¬ 
trix). 


Mil  -  L„Ui,  -  I  +  A+  -  A,-  -  Ar  Af 
Mi2  =  ~  A2  +  Aj 

M21  =  L21U11  =  —  A*  + 

M22  “  L22U22  +  ^21^12  “  I  +  A^  —  A2  “  A^  ^  ^ 

^23  =  L22U23  =  A3  -I-  A^  ^ 

M32  =  L32U22  “  “  Aj"  +  A^  ^ 

M33  =  ^3^33  +  1^2^23  ~  I  +  A^  —  A3  —  ^  ^  —  A2  A3 
M34  =  L33U34  =  A4  +  A4 


(84) 


The  tenns  which  are  underlined  do  not  appear  in  the  solution  matrix  of  the  direct  solver  and 
hence  are  the  AF  errors.  Note  how  they  are  dispersed  throughout  the  matrix. 

Next,  take  a  look  at  the  modified  two-factor  scheme  (modified  two-pass  scheme  of  Section 
IV).  Rewriting  Equation  (77)  as 

(  Di  -  AjUi  +  Af+i  )  AQ  =  R 

where 

Di  =  I  +  A.+  -  Af 

The  modified  two-factor  scheme  is  given  by  the  following  iqyproximate  factoring 

(  D,  -  A,t, )  D,->  (  Dj  +  Alt,  )  AQ  -  R  .  (85) 

thus 

L  D  U  AQ  =  R 


The  solution  procedure  for  this  modified  two-factor  scheme  can  be  viewed  as  follows 


LX^  =  R 
D  X2  = 

U  X^  =  X* 

AQ  =  X^  =>  0”+^  =  0“  -I-  AQ 


(86) 


The  structure  for  die  qiproximate  lowm*  block  triangular  and  xxpper  block  triangular  matrices  is 
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(87) 


M  »  LDU 
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To  lessen  the  confusion  between  the  system  matrix  D  and  the  point-block  matrices  denoted 
Di  for  each  cell,  define 


L'  =  L  D 


I  0  0 

-  A,*  Df*  I  0 

0  -  D,->  I 

0  0  Dj-‘ 

0  0  0 

0  0  0 

•  *  • 


0 

0 

0 

I 


-  a;  d;' 
0 


thus. 


M  «  L'U 


(88) 


(89) 


which  leads  to  the  following  entries  in  the  resulting  approximate  M  matrix  (operator  product  ma¬ 
trix). 

Mil  =  L'liUn  =  Dj  =  I  -I-  A+  -  Af 

Mj2  ”  LiiUi2  ”  ^2 

M21  =  LiiU„  =  -  A+  Di  =  -  A  + 

M22  ”  L22U22  +  ^1^12  =  I  +  A^  —  A2  —  Ai^  Di  ^  ^ 

M23  =  L22U23  =  A3 

M32  =  L32U22  *  -  ^ 

M33  *=  L33U33  +  1-32^23  “  I  "i"  A^  —  A3  —  A^  D?  ^  ^ 

M34  ”  L33U34  *  A4 


The  terms  appearing  which  are  underlined  are  again  the  AF  errors.  As  can  be  seen,  for  the 
modified  two-^tor  scheme  the  error  terms  only  appear  in  die  du^onal  elements.  Although 
presendy  just  a  speculadve  observation,  it  would  appear  that  the  enor  associated  with  the  modi¬ 
fied  two^actmr  scheme  is  far  less  than  that  of  the  standard  two-factor  scheme.  Defining  the 
block  matrices  Di  as  follows 
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Di  =  I  +  Ai+  -  Af  +  Ajti  Dr_\  Af 


(91) 


all  the  AF  error  is  removed  yielding  a  block-tridiagonal  matrix  direct  solution  procedure  (Tho¬ 
mas’  algorithm).  This  works  well  in  one-dimension  when  the  blocks  are  full,  there’s  no 
memory  penalty.  V^th  a  single  dimension,  the  system  matrix  is  a  matrix  with  block  elements 
each  of  which  is  composed  of  3x3  scalar  entries  (point-blocks).  In  this  simple  case  of  the  one¬ 
dimensional  system  all  of  these  elements  (point-blocks)  are  full,  i.e.  the  diagonal  and  off-diago¬ 
nal  elements  (point-blocks)  are  full  3x3  sub-matrices.  In  multiple  dimensions  there’s  one  ca¬ 
veat.  For  multiple  dimensions,  the  system  matrix  is  a  matrix  with  block  elements  (matrix- 
blocks)  which  are  in  turn  composed  of  block  matrices  (point-blocks)  which  have  NxN  scalar  en¬ 
tries,  where  N  is  the  munber  of  dependent  variables  at  each  point.  The  diagonal  and  off-diago¬ 
nal  matrix-blocks  are  sparse  block-banded  sub-matrices.  It  is  this  sparsity  which  is  desired  for 
an  efficient  solution  procedure.  As  a  matter  of  fact  the  diagonal  structure  of  the  di^onal  block 
(when  using  a  matrix  ordering  based  on  diagonal  plane  processing)  is  what  provides  the  vector 
processing  ability  for  the  AF  methods.  The  lack  of  off-diagonal  point-blocks  in  the  diagonal 
matrix-block  represents  the  local  uncoupled  nature  of  points  lying  on  diagonal  planes,  setting 
the  stage  for  simultaneous  (vector)  processing.  The  point  (2,2),  for  example,  is  influenced  by 
the  point  (3,1)  indirectly.  With  a  direct  solver,  during  die  solution  procedure,  matrix  fiU-in 
creates  an  influence  coefficient  point-block  matrix  (communication  path)  where  previously  there 
was  none  (null  point-block).  These  paths  are  indirect  as  is  indicated  by  the  wide  open  arrows  in 
Figure  12,  and  are  very  numerous  since  all  points  will  influence  all  others  to  some  extent 


Figure  12.  Indirect  Communication  Paths 
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In  the  figure  colinear  (diagonal)  points  are  shown  with  a  dark  dot,  the  (haloed)  point  of  interest 
is  in  direct  communication  with  the  points  to  the  immediate  right  and  left,  and  top  and  bottom. 
Colinear  neighboring  points  are  only  in  indirect  communication.  In  addition,  the  impact  of  the 
solution  of  (2,2)  on,  for  example,  (2,1)  will  again  impact  (2,2)  the  path  of  which  is  indicated 
with  the  solid  arrow. 

Examining  the  two-dimensional  case  a  little  further  will  reveal  more  details  of  this  dilem¬ 
ma.  The  modified  two-factor  scheme  written  for  two  dimensions  is 


(  Dij  -  Aj+ij  -  Bjt_j)  Djji  (  Djj  -I-  Af+ij  +  Bij+i)  AQ  =  R  (92) 

where 


Dij  =  I  +  Ajt  -  Ay  -t- 


K  -  By 


(93) 


Consider  the  system  matrix  resulting  from  a  diagonal  line  ordering. 
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The  diagonal  matrix-blocks  are  diagonal  themselves,  hence  the  local  uncoupled  nature  of 
the  points  lying  on  diagonal  lines  is  readily  seen.  The  question  then  arises,  “How  do  they  get 
coupled?”,  because  it  is  known  that  for  an  implicit  scheme,  such  as  that  used  here,  all  points  in¬ 
fluence  all  other  points.  It  is  this  coupling  which  differentiates  die  direct  solver  from  the 
iqiproximation  techniques.  For  a  direct  solver  during  the  LU  decomposition  process  the  one  of 
the  diagonal  matrix-blocks  belonging  to  either  the  lower  or  upper  matrix  will  no  longer  retain 
this  diagonal  structure,  it  fills  in.  This  fill-in  accomplishes  die  coupling  between  points  lying  on 
a  diagonal  line  (or  plane  in  3D)  but  ruins  any  vectorization  along  diagonal  lines  (or  planes  in 
3D).  If  the  retension  of  no  fill-in  and  the  current  approach  for  vector  processing  is  desired,  then 
some  data  paths  (solution  coupling)  must  be  cut  The  modified  two-factor  seems  to  have  done 
an  excellent  job  of  reconnecting  data  padis  cut  by  the  standard  two-factor  and/or  disconnecting 
faulty  data  paths  created  by  die  standard  two-factor  scheme  in  the  off-diagonal  matrix-blocks 
and  some  in  diagonal  matrix-blocks.  There  exists  some  AF  oror  (data  padi  error)  which  re- 
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mains  in  the  diagonal  matrix-blocks  which  can  be  removed  while  still  retaining  the  no  fill-in 
matrix  structme  enjoyed  with  the  AF  schemes.  To  see  this,  one  can  simply  run  through  the  ma¬ 
trix  products  which  result  firom  the  AF  process  and  compare  to  the  matrix  in  Equation  (94) 
above.  Consider  the  following  AF  matrices 

1.1  U  2,1  u  2.2  3.1 
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Using  row-column  subscripts  (with  no  commas)  to  denote  matrix-blocks,  the  first  few  en¬ 
tries  in  the  resulting  system  matrix  are 
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Biti  ^i.}  Ail' 
Au  ^1,1  Ail 


Examining  the  last  entry  shown  here,  it  can  be  seen  that  the  AF  error  appears  in  a  matrix- 
block  in  the  form  of  point-blocks.  The  off-diagtmal  entries  in  diis  error  matrix-block  can  not 
be  removed  if  no  fill-in  is  desired,  but  the  errors  appearing  on  the  diagonal  can  be  removed  by 
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adding  those  terms  to  the  point-block  D, As  long  as  the  diagonal  matrix-blocks  are  main¬ 
tained  as  diagonal  block  matrices  themselves,  there  is  no  fill-in.  Thus  the  point-block  Djj  is  de¬ 
fined 


Dy  -  I  +  AJ  -  Ay  +  BiJ  -  By  +  Ait,jD-'jAy  +  Bit.,Dyl,By  (98) 
or 

Dy  -  I  +  AiJ  +  BiJ  +  (  At.ipr-U  -  I  )  Ay  +  (  B^.,Dyl,  -  I  )  By  (99) 

for  optimal  solution  coupling  with  no  fill-in.  The  extension  of  this  to  three  dimensions  is 
straight  forward,  yielding 

Dy*  =  I  +  Aji  +  +  Cjk  +  (  -  I  )  Ayj,  (100) 

+  <  -  I  )  By*  +  (  -  I  )  Cyj, 


To  test  the  new  factoring  scheme,  a  simple  two-dimensional  geometry  was  selected.  The 

geometry  is  that  of  a  NACA  64A012  airfoil  at  two  degrees  angle-of-attack  in  a  Mach  0.87  flow. 

An  H-grid  using  two  blocks  (each  71x31),  one  upper  and  one  lower  was  used  to  discretize  the 

domain.  Both  the  upper  and  lower  surfaces  of  the  airfoil  are  modeled  with  thirty  cells. 

« 

The  results  of  the  tests  are  given  in  Figures  13  and  14 .  These  figures  show  a  comparison  of 
the  convergence  histories  for  the  standard  two-factor  (STDAF),  the  modified  two-factor  (MAP), 
and  the  optimized  two-factor  (OAF).  It  can  be  seen  fiom  the  limited  tests  that  were  run  (using 
local  time-stepping)  that  the  sensitivity  to  larger  CFL  numbers  exhibited  by  the  STDAF  scheme 
is  not  {y)parent  with  either  MAF  or  OAF.  Although  this  is  the  case,  for  this  test  configuration, 
both  MAF  and  OAF  seem  to  exhibit  a  less  rapid  convergence  rate  than  the  STDAF  for  a  given 
number.  It  is  important  to  note  that,  although  not  shown,  both  MAF  and  OAF  maintained 
stable  convergence  for  CPUs  in  the  hundreds  while  the  STDAF  went  divergent  early-on  for  a 
CFL  of  one  hundred.  Little  difference  in  convergence  rate  is  seen  between  the  MAF  and  the 
OAF  for  this  particular  configuration.  So  little  difference  in  fact  that  the  extra  computations 
necessary  in  the  OAF  method  are  not  warranted.  This  may  not  be  the  case  for  some  other  geom¬ 
etries  though.  More  convergence  analyses  need  to  be  performed  on  various  geometries  and  flow 
conditions. 
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Figure  13a.  Approximate  Factorization  Operator  Convergence  History  Comparison  (CFL  -  5,  Lower 
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Figure  14a.  Approximate  Factorization  Operator  Convergence  History  Comparison  (CFL  ■  10,  Lower  Block) 


Convergence  History  (IFREQ=1') 

Total  Residue  (upper  block) 
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Cycle 

,  NACA  64A012  Airfoil 

(AoA  -  2“ ,  CFL  -  10) 

Factorization  Operator  Convergence  History  Comparison  (CFL  -  10.  Upper  Block) 


3.  DIAGONAL  PLANE  PROCESSING  SOFTWARE  ANALYSIS 


During  the  course  of  the  software  (and  algorithm)  development  for  the  flow  model  pres¬ 
ented  here  significant  emphasis  has  been  placed  on  the  impact  of  solution  vectorization.  So 
much  in  fact  that  special  matrix  orderings  were  selected  to  “create”  computational  vectors  for 
simultaneous  solution.  This  has  proven  to  be  a  viable  technique  although  there  is  one  known 
shortcoming,  data  latency.  Oay  vector  processors  store  data  in  a  sequence  of  memory  banks. 
Each  machine  is  different  in  the  number  of  memory  banks  it  contains,  usually  some  multiple  of 
eight,  i.e.  8, 16, 32,  etc.  The  memory  also  has  a  specific  reserved  access  time,  referred  to  as  the 
bankbusy  cycle  time  which  is  measured  in  clock  cycles.  If  a  memory  bank  is  accessed  twice 
within  the  bankbusy  cycle  time  the  cpu  must  wait  for  the  data  requested.  This  situation  is  re¬ 
ferred  to  as  a  memory  bank  conflict  and  can  signiricantly  disturb  the  processing  rate  of  an  other¬ 
wise  standard  vector  loop,  see  Reference  30.  Conditions  that  create  memory  bank  conflicts 
which  should  be  avoided  for  standard  (regular  stride)  loops  are  well  documented  in  the  Cray 
manuals.  The  difficulty  here  is  the  loops  associated  with  diagonal  plane  processing  use  indirect 
addressing  which  results  in  an  uneven  stride  and  hence  the  potential  for  unpredictable  memory 
bank  conflicts. 

To  examine  this,  a  test  case  was  devised  which  could  have  the  computational  domain  resized 
automatically  (max  i,  j,  k  indices  varied)  such  that  all  typical  grid  block  dimensions  could  be 
tested  for  memory  bank  conflicts  on  a  particular  machine.  A  grids  block  dimensions  determine 
the  nature  of  the  array  storage  in  the  memory  banks  and  hence  by  parametrically  timing  the  pro¬ 
cessing  rate  (of  the  matrix  inversion  using  diagonal  plane  processing)  of  different  dimension 
blocks,  certain  trends  can  be  observed  to  steer  a  user  away  from  a  particular  grid  size.  To  date 
this  has  been  quite  a  successful  undertaking.  A  parametric  analysis  of  grid  sizes  ranging  firom 
10  <  inux  <  66, 4  <  jmax  <  32,  and  4  <  kni.T  <  24  has  been  accomplished  on  a  Cray  X-MP  with 
32  memory  banks  and  a  bankbusy  value  of  8  clock  cycles.  What  was  observed  from  this  study 
was  that  grid  dimensions  involving  an  in,.x  of  (multiples  of  8)  +  1  created  significantly  more 
memory  bank  conflicts  than  the  other  grids  regardless  of  the  jnm  and  k^ax  values,  see  Figures 
IS.  Grids  with  im.T  of  17,  and  49  showing  moderate  increases  in  processing  time,  and  33,  and 
65  showing  approximately  a  three-  to  four-^old  increase  in  processing  time.  The  complexity  of 
the  operations  involved  in  the  loops  and  the  maimer  in  which  the  C!ray  manipulates  data  prohib¬ 
its  (at  die  present  time)  a  detailed  explanation  as  to  the  source(s)  of  conflicts  for  these  grids.  An 
interesting  potential  corrective  measure  was  devised  though. 
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The  manner  in  which  data  in  the  loops  is  accessed  is  determined  largely  by  the  mt^ping  of 
the  elements  lying  on  the  diagonal  planes,  i.e.  the  matrix-block  structures.  Coplanar  solution 
nodes  are  computationally  independent  and  hence  the  order  in  which  they  are  grouped  for  pro¬ 
cessing  is  of  no  consequence  to  their  simultaneous  processing.  The  sequence  of  data  acquisition 
on  the  other  hand  is  greatly  effected  by  this  ordering.  The  simple  mqjping  process  described  in 
Part  1  of  this  section  was  used  in  this  study,  Le.  KJl  looping  for  a  sampling  routine.  The  inter¬ 
esting  partial  fix  for  grids  with  an  imax  dimension  creating  memory  bank  conflicts  is  to  simply 
change  the  order  in  which  the  nodes  are  sampled  to  determine  which  plane  they  belong. 
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Hgine  16.  Alternate  Diagonal  Plane  Mappings  of  Points  Lying  on  Plane  Six 


Hgoie  16  shows  thiee  different  mappings  of  die  fiauiteen  points  which  lie  on  plane  number  six 
of  die  sample  domain  of  Part  1  of  this  sectkm.  The  starting  location  fra- each  different  mapping 
is  the  large  point  Each  point  mapping  (order)  is  listed  to  the  side  of  the  diagram.  To  establish 
the  point  order  simply  follow  die  arrows.  Presmit  indicatimis  reveal  diat  moving  the  i  lo(^  as  the 
outermost  loop  of  the  sampling  (mai^g)  process  the  memmy  bank  cmiflict  phenomena  could 
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be  altered.  For  example,  the  grid  with  imax  of  33  which  previously  posed  a  significant  memory 
bank  problem  could  be  fixed  simply  by  shifting  to  IKJ  looping  in  the  mapping  process.  Figure 
17  shows  the  processing  time  of  a  small  window  around  the  imax  of  33  grid  size.  It  is  obvious 
that  imax  of  33  no  longer  experiences  memory  bank  conflicts  to  the  extent  seen  for  KJI  looping. 
Although  now  grids  with  jmax  indices  of  17  are  hindered  with  conflicts  as  well  as  imax  of  32. 
Additional  data  must  be  collected  to  determine  reliable  corrective  measures  for  averting  memory 
bank  conflicts. 
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Figure  17.  Matrix  Inversion  CPU  Time  per  Grid  Point  (IKJ  Looping,  K-constant  Plane) 


SECTION  VI 


RESULTS  AND  DISCUSSION 
1.  HYPERSONIC  BLUNT  NOSE  CYLINDER 

Verification  of  the  acoaracy  of  any  prc^aosed  numerical  method  of  calculating  the  heat- 
transfer  rate  to  the  surface  of  a  hypersonic  re-entry  vehicle  invariably  dqpends  upon  its  subse¬ 
quent  agreement  with  experimental  data.  Therefore,  in  (»xier  to  test  and  demonstrate  the  accura¬ 
cy  and  efficiency  of  the  present  unsteady  TLNS  flow  solver,  the  flow  over  two  basic 
axisymmetric  blunt-body  configurations  was  investigated.  The  first  configuration  was  a 
15°-semiiq}ex  spherical,  blimt  cone  with  a  bluntness  ratio  (ratio  of  nose  radius  to  base  radius)  of 
0. 183.  Model  base  diameter  was  one  foot.  The  second  configuration  used  in  this  investigation 
was  a  spherically  capped,  9**  half-angle  cone.  A  C-type  grid  system  is  used  to  discretize  the 
computation  domain  about  these  configurations.  The  second-order  flux-difference  split  scheme 
with  Van  Leer  limiter  was  used  for  the  present  wm-k.  A  laminar  numerical  solution  flow  condi¬ 
tion  of  Mach  10.6,  wall  to  stagnation  temperature  ratio  CIw/To)  of  0.30,  and  free-stream  unit  Re¬ 
ynolds  number  of  1.2  x  10^  per  foot  for  the  first  configuration  was  obtained.  Figure  18  gives  a 
comparison  of  measiucd  data  [Reference  31]  and  computed  axial  distribution  of  laminar  heat- 
transfer  for  the  first  configuration.  As  seen  in  this  figure,  the  computed  rates  of  heat-transfer 
show  excellent  agreement  with  the  experimental  data.  Unfortunately,  there  are  no  surface  pres¬ 
sure  and  turbulent  heat-transfer  experimental  data  available  for  comparison  for  this  model. 
Therefore,  the  second  configuration  was  used  for  a  better  understanding  of  the  salient  features  of 
the  turbulent  flowfield  structure  and  determination  of  turbulent  heat-transfer  on  axisymmetric 
bltmt  nosed  bodies  in  hypersonic  flow.  The  numerical  solutions  wo'e  generated  at  5.0  and  10.6 
free  stream  Mach  number  for  the  above  mentioned  second  configuration.  The  wall  to  stagnation 
temperature  ratio  (Tn/To)  used  was  0.23  and  0.3  respectively.  Comparison  of  measured  data 
[Reference  32]  and  computed  axial  distribution  of  stuface  pressture  and  turbulent  heat-transfer 
on  the  hemispherical  9**  half-angle  cone  at  a  free  stream  Mach  number  of  5.0  and  Reynolds 
number  based  on  nose’s  diameter  of  43.92  x  10^  are  given  in  Figures  19  and  20  respectively. 
Although  the  computed  pressure  results  yield  good  agreement  with  the  experimental  data,  the 
potM*  agreement  between  computed  and  measured  turbulent  heat-transfer  results  are  noticeable. 
Howevo*,  overall  trends  are  clearly  ci^tured  qualitatively.  The  flowfield  about  this  geometry 
was  also  computed  at  free-stream  Mach  number  of  10.6  and  Reynolds  number  of  12.0  x  10^  per 
foot.  Figures  21  and  22  illustrate  the  computed  surface  pressure  and  turbulent  heat-transfer 
comparison  with  experimental  data  fex  this  configuration.  Dearly  shown  in  these  figures  is  that 
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the  computed  surface  pressure  and  turbulent  heat-transfer  are  quantitatively  and  qualitatively  in 
good  agreement  with  the  available  experimental  data. 
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2.  BODY-STORE 


The  applicability  and  performance  of  the  present  flow  solver  was  also  tested  for  the  flow 
over  a  generic  two-dimensional  planar  body-store  configuration  at  zero  degree  angle  of  inci¬ 
dence  and  free  stream  Mach  number  of  2.5  and  Reynolds  number  based  on  body  thickness  width 
of  5.80  X  10®,  the  viscous  clustering  yielded  an  average  y*  value  of  4  for  the  first  grid  point  off 
the  body  surface.  Both  body  and  store  consist  of  an  ogival  forebody,  a  flat  midsecdon  and  an 
ogival  afterbody.  Figure  23  displays  this  geometry.  A  H-type  grid  system  is  used  to  discretize 
the  domain  around  the  body-store  configuration.  The  computational  region  for  this  configura¬ 
tion  is  divided  into  three  blocks.  Each  block  has  a  total  of  5994  (81  x  37  x  2)  points,  for  a  sum 
total  of  17982  points  for  the  entire  field  grid.  Steady  state  multiblock  solutions  are  demonstrated 
by  the  flow  about  the  body-store  configuration  with  the  store  in  captive  position.  Unsteady  dy¬ 
namic  multiblock  solutions  are  demonstrated  by  computing  the  flow  of  the  complete  multibody 
configuration  as  the  store  moves  away  from  the  parent  body  configuration  through  a  prescribed 
vertical  launch  trajectory.  The  steady  state  calculation  was  started  with  iiutial  conditions  of  free 
stream  values  everywhere,  and  then  2000  iterations  of  local  time  stepping  with  the  CFL  number 
equal  to  3.5  was  used  to  establish  the  steady  state  flow  field  before  unsteady  motion  was  started. 
The  unsteady  calculation  starts  after  2(XX)  iterations  of  the  steady  state  calculation.  The  plunge 
velocity  of  the  store  moving  away  firom  body  was  then  suddenly  set  to  a  predetermined  velocity 
and  held  constant  thereafter,  moreover,  the  local  time  step  was  replaced  by  a  fixed  time  step,  the 
same  at  all  points,  when  the  unsteady  calculation  began.  The  unsteady  numerical  solutions  were 
continued  until  the  store  moved  to  a  point  four  body  widths  away  from  the  body.  The  overall 
pressure  distribution  of  the  body-store  configuration  as  the  store  moves  away  from  the  body  is 
depicted  in  Figure  24.  Figure  24-a  demonstrates  the  steady  state  computation  with  the  store  in 
captive  position  at  the  instant  of  vertical  launch,  which  becomes  the  initial  condition  for  the  un¬ 
steady  simulation.  Moreover,  Figure  24-a  shows  that  the  flow  accelerates  toward  the  shock 
wave  on  the  lower  flat  portion  of  the  body  and  then  decelerates  after  crossing  the  shock;  this  in¬ 
teraction  is  strong  enough  to  cause  flow  separation.  Figures  24-b,  c,  d,  e  demonstrate  the  un¬ 
steady  computation  when  the  store  is  mooting  through  the  points  of  0.5, 1.0, 1.5,  and  2.0  body 
widths  away  from  the  body.  In  Figure  24  the  viscous  turbulent  results  are  also  compared  with 
inviscid  solutions  at  the  same  sequence  of  the  store’s  position  relative  to  the  body.  The  evidence 
of  viscous  effect  is  mostly  seen  in  the  region  between  the  store  and  body,  particularly  when  the 
store  is  near  the  body.  Note  that  the  greatest  difference  in  inviscid  and  viscous  results  is  the  pre¬ 
diction  of  the  shock  strength  and  position  near  the  nose  of  the  store.  Shock  strength  is  larger  and 
shock  position  is  farther  downstream  when  the  Euler  model  is  used.  Unfortunately,  there  are  no 
experimental  data  available  for  comparison  with  the  numerical  results. 
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Surface  Pressure  Distributions 

M=5.0,  Re=18.3*(10**6),  TwrTo=0.23 


Figure  19.  Computed  and  Measured  Surface  Pressure  Distributions  on  Blunt  (9®  half-angle  cone) 

Body  at  M  ■  5.0,  Rp  *  18.3  x  10^ 


Turbulent  Heat-Transfer  Distributions 

M=5.0,  Re=18.3*(10**6),  Tw/To=0.23 
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Figure  20.  Computed  and  Measured  Surface  Heat-Transfer  Distributions  on  Blunt  (9°  halfrangle^cone). 

Body  at  M  -  5.0,  Rp  -  18;3  x  10^ 


Surface  Pressure  Distributions 

M=10.6,  Re=12.0*(10**6),  T/To=0.30 
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Figure  21.  Computed  and  Measured  Surface  Pressure  Distributions  on  Blunt  (9°  half-angle  cone) 

Body  at  M  -  10.6,  Rg  -  12.0  x  10^ 


Turbulent  Heat-Transfer  Distributions 

M=1 0.6,  Re=1 2.0*{1 0**6),  Tw/To=0.30 
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Figure  22.  Computed  and  Measured  Surface  Heat-Transfer  Distributions  on  Blunt  (9“  half,-a,ngle. 
.  Body  at  M  -  10.6,  Rp-  12-.0  x  10^ 
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Figure  23.  Perspective  View  of  the  Body-Store  Configuration 
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Figure  24.  Computed  Overall  Pressure  Distribution  Contours  with  Store  Located 
in  Captive  Position  and  Moving  through  0.5,  1-0,  1.5,  and  2.0  Body 
Widths  Below  the  Body 


Figure  24.  Continued 
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CONCLUDING  REMARKS 

Numerical  methods  for  fonnulating  and  solving  the  unsteady  three-dimensional  compress¬ 
ible  Euler  and  Navier-Stokes  equations  have  been  presented.  In  addition,  an  analysis  of  the  mo¬ 
dified  two-pass  numerical  solution  method  has  been  performed  and  presented  in  terms  of  soft¬ 
ware  optimization.  The  guidelines  for  the  flow  conditions  of  interest  in  this  research  was  a 
fieestream  Mach  number  of  10  and  below  at  an  altitude  of  100,000  feet  and  below.  For  such  a 
flow  regime  the  influence  of  chemistry  can  be  important  As  such  there  is  also  an  additional  in¬ 
vestigation  into  the  numerical  formulation  and  solution  of  the  equations  including  equilibrium 
chemistry.  This  effort  is  repotted  in  a  separate  report  as  another  phase  of  this  same  research  ef¬ 
fort 
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